Andy in the Cloud

From BBC Basic to Force.com and beyond…

Batch Worker, Getting more done with less work…

12 Comments

Batch Apex has been around on the platform for a while now, but I think its fair to say there is still a lot of mystery around it and with that a few baked in assumptions. One such assumption I see being made is that its driven by the database, specifically the records within the database determine the work to be done.

construction_workerAs such if you have some work you need to get done that won’t fit in the standard governors and its not immediately database driven, Batch Apex may get overlooked in favour of @future which on the surface feels like a better fit as its design is not database linked in anyway .  Your code is just an annotation away to getting the addition power it needs! So why bother with the complexities of Batch Apex?

Well for starters, Batch Apex gives you an ID to trace the work being done and thus the key to improving the user experience while the user waits. Secondly, if any of your parameters are lists or arrays to such methods, your already having to consider again scalability. Yes, you say, but its more fiddly than @future isn’t it?

In this blog I’m going to explore a cool feature of the Batch Apex that often gets overlooked. Using it to implement a worker pattern giving you the kind of usability @future offers with the additional scalability and traceability of Batch Apex without all the work. If your not interested in the background, feel free to skip to the Batch Worker section below!

IMPORTANT NOTE: The alternative approach described here is not designed as a replacement to using Batch Apex against the database using QueryLocator. Using QueryLocator gives access to 50m records, where as the Iterator usage only 50k. Thus the use cases for the Batch Worker are more aligned with smaller jobs perhaps driven by end user selections or stitching together complex chunks of work together.

Well I didn’t know that! (#WIDKT)

First lets review something you may not have realised about implementing Batch Apex. The start method can return either a QueryLocator or something called Iterable. You can implement your own iterators, but what is actually not that clear is that Apex collections/lists implement Iterator by default!

Iterable<String> i = new List<String> { 'A', 'B', 'C' };

With this knowledge, implementing Batch Apex to iterate over a list is now as simple as this…

public with sharing class SimpleBatchApex implements Database.Batchable<String>
{
	public Iterable<String> start(Database.BatchableContext BC)
	{
		return new List<String> { 'Do something', 'Do something else', 'And something more' };
	}

	public void execute(Database.BatchableContext info, List<String> strings)
	{
		// Do something really expensive with the string!
		String myString = strings[0];
	}

	public void finish(Database.BatchableContext info) { }
}

// Process the String's one by one each with its own governor context
Id jobId = Database.executeBatch(new SimpleBatchApex(), 1);

The second parameter of the Database.executeBatch method is used to determine how many items from the list are pass to each execute method invocation made by the platform. To get the maximum governors per item and match that of a single @future call, this is set 1.  We can also implement Batch Apex with a generic data type know as Object. Which allows you to process different types or actions in one job, more about this later.

public with sharing class GenericBatchApex implements Database.Batchable<Object>
{
	public Iterable<Object> start(Database.BatchableContext BC) { }

	public void execute(Database.BatchableContext info, List<Object> listOfAnything) { }

	public void finish(Database.BatchableContext info) { }
}

A BatchWorker Base Class

The above simplifications are good, but I wanted to further model the type of flexibility @future gives without dealing with the Batch Apex mechanics each time. In designing the BatchWorker base class used in this blog i wanted to make its use as easy as possible. I’m a big fan of the fluent API model and so if you look closely you’ll see elements of that here as well. You can view the full source code for the base class here, its quite a small class though, extending the concepts above to make a more generic Batch Apex implementation.

First lets take another look at the string example above, but this time using the BatchWorker base class.

public with sharing class MyStringWorker extends BatchWorker
{
	public override void doWork(Object work)
	{
		// Do something really expensive with the string!
		String myString = (String) work;
	}
}

// Process the String's one by one each with its own governor context
Id jobId =
	new MyStringWorker()
            .addWork('Do something')
            .addWork('Do something else')
            .addWork('And something more')
            .run()
            .BatchJobId;

Clearly not everything is as simple as passing a few strings, after all @future methods can take parameters of varying types. The following is a more complex example showing a ProjectWorker class. Imagine this is part of a Visualforce controller method where the user is presented a selection of projects to process with a date range.

	// Create worker to process the project selection
	ProjectWorker projectWorker = new ProjectWorker();
		
	// Add the work to the project worker
	for(SelectedProject selectedProject : selectedProjects)		
		projectWorker.addWork(startDate, endDate, selectedProject.projectId);
			
	// Start the workder and retain the job Id to provide feedback to the user
	Id jobId = projectWorker.run().BatchJobId;		

Here is how the ProjectWorker class has been implemented, once again it extends the BatchWorker class. But this time it provides its own addWork method which takes the parameters as you would normally describe them. Then internally wraps them up in a worker data class. The caller of the class, as you’ve seen above is is not aware of this.

public with sharing class ProjectWorker extends BatchWorker
{	
	public ProjectWorker addWork(Date startDate, Date endDate, Id projectId)
	{
		// Construct a worker object to wrap the parameters		
		return (ProjectWorker) super.addWork(new ProjectWork(startDate, endDate, projectId));
	}
	
	public override void doWork(Object work)
	{
		// Parameters
		ProjectWork projectWork = (ProjectWork) work;
		Date startDate = projectWork.startDate;
		Date endDate = projectWork.endDate;
		Id projectId = projectWork.projectId;		
		// Do the work
		// ...
	}
	
	private class ProjectWork
	{
		public ProjectWork(Date startDate, Date endDate, Id projectId)
		{
			this.startDate = startDate;
			this.endDate = endDate;
			this.projectId = projectId;
		}
		
		public Date startDate;
		public Date endDate;
		public Id projectId;
	}
}

As a final example, recall the fact that Batch Apex can process a list of generic data types. The BatchProcess base class uses this to permit the varied implementations above. It can also be used to create a worker class that can do more than one thing. The equivalent of implementing two @future methods, accept that its managed as one job.

public with sharing class ProjectMultiWorker extends BatchWorker 
{
	// ...

	public override void doWork(Object work)
	{
		if(work instanceof CalculateCostsWork)
		{
			CalculateCostsWork calculateCostsWork = (CalculateCostsWork) work;
			// Do work 
			// ...					
		}
		else if(work instanceof BillingGenerationWork)
		{
			BillingGenerationWork billingGenerationWork = (BillingGenerationWork) work;
			// Do work
			// ...		
		}
	}
}

// Process the selected Project 
Id jobId = 
	new ProjectMultiWorker()
		.addWorkCalculateCosts(System.today(), selectedProjectId)
		.addWorkBillingGeneration(System.today(), selectedProjectId, selectedAccountId)
		.run()
		.BatchJobId;

Summary

Hopefully I’ve provided some insight into new ways to access the power and scalability of Batch Apex for use cases which you may not have previously considered or perhaps used less flexible @future annotation. Keep in mind that using Batch Apex with Iterators does reduce the number of items it can process to 50k, as apposed to the 50m when using database query locator. At the end of the day if you have more than 50k work items, your probably wanting to go down the database driven route anyway. I’ve shared all the code used in this article and some I’ve not shown in this Gist.

Post Credits
Finally, I’d like to give a nod to an past work associate of mine, Tony Scott, who has taken this type of approach down a similar path, but added process control semantics around it. Check out his blog here!

12 thoughts on “Batch Worker, Getting more done with less work…

  1. When Database.Batchable is used to process SObjects (which is a common case), returning a QueryLocator from the start method is documented (at least here http://www.salesforce.com/us/developer/docs/apexcode/Content/apex_batch_interface.htm) to have the affect that “the governor limit for the total number of records retrieved by SOQL queries is bypassed”. That seems a compelling advantage over returning an Iterable from the start method, and so I question whether your BatchWorker pattern should ever be used for SObjects? None of us want to hit governor limits as data volumes grow in orgs…

    • Yes totally agree, as per my summary, you get 50k with Iterators and 50m (corrected from 5m) with QueryLocator. The aim here is to provide an easier Batch Apex implementation in cases where @future may have been previously considered and/or the work needed may not be directly related to processing records in the database. For example actions triggered by UI selections.

      • Fair enough if its for non-SObject processing. Also AFAIK there is no 5m limit for QueryLocator but point me to the documentation if I’m wrong.

    • No worries, thanks for the feedback, I’ve made a further clarification in the blog. It actually turns out, having checked the docs, its 50m with a QueryLocator. So yes, quite an increase, you’ll more than likely be driving a job from the database with those volumes, since you’d probably not get 50m items in the heap! 🙂

  2. Thanks for writing about this alternative approach to push the Limits.. I have created and successfully used similar solutions in the past e.g. use Batch to copy millions of records from org 1 to org 2 using JSON over callouts. I used an Iterable to not run into the callout limit.

    Besides that I am currently working on a similar custom SObject Work Queue, which receives Command-Pattern like Work Orders, persits them in the database and executes them in a generic batch “queue”when there is time to do it.

    If you are interested to see and maybe discuss about this implementation let me share an overview UML here:

  3. Sure, I will setup a Github project in the next few days and post this via Google+ and here in you blog. I would love to get your feedback.

  4. Many thanks, this approach was perfect for my particular situation, which was a good example of why you might want to use iterable sObjects in a Batch. I had a couple thousand sObjects that needed to be updated; they represented a queue of manufacturing and deployment projects that needed to be matched up against shipping schedules. I was only updating a couple thousand sObjects . . . but they in turn fired triggers that updated due dates on associated tasks, and updated related legal-compliance objects that also had their own associated tasks to be rescheduled, etc. Even with highly optimized triggers, the cascade of triggers quickly led to blowing out the 10K limit on DML rows. I tried using @future calls to spread out the load, but quickly hit the Apex CPU time limit. I knew that I would ultimately be driven to use a Batch, but a QueryLocator couldn’t capture the logic for all the records I needed to update. A Batch with an Interable list of sObjects let me pass my list of sObjects in and process them in optimally sized chunks to avoid governor limits but still process them in big chunks.

  5. Fantastic story thanks for sharing!

Leave a comment