John Ratcliff's Code Suppository

A place where I insert my code into the anus of the Internet.

 

Friday, January 02, 2009

JobSwarm : A microthreading framework with lockless FIFO



(I'm trying to use Google Code for the first time. You can try getting the code from there at this link. Please read the latest post about JobSwarm on this site.)

I used part of my Christmas break to implement a system that I have been wanting for a long time. I call it a 'JobSwarm'. What it is, is a micro-threading library that can handle tens of thousands of extremely tiny jobs with almost no overhead.

In the sample program provided I compute a fractal by decomposing it into 65,536 individual jobs, each one representing 64 pixels of a 2kx2k image.

The usage pattern for the JobSwarm is as follows.

First, you create a JobSwarmContext. This is a single instance of the JobSwarm system. When you create a JobSwarmContext you tell it how many physical threads you want to use. You can either create a single JobSwarmContext for your entire application, or you can create different ones in various subsystems.

When a JobSwarmContext is released any currently executing jobs will have to be completed before the method can return.

To use the JobSwarm you simply inherit the pure-virtual interface 'JobSwarmInterface' and implement three methods.

They are:

job_process
job_onFinish
job_onCancel

You find this interface in the header file 'JobSwarm.h' and enclosed in the namespace 'JOB_SWARM'.

A simple example of how to use this is in 'fractal.cpp' and in the class 'FractalJob'.

The three methods are implemented as follows:

job_process : This is the actual 'work' this job is supposed to do. This code will operate in a seperate thread and, therefore, must be entirely thread safe.

job_onFinish : This is a method which is called when your job has completed. This method is invoked in the main thread (the thread which owns the JobSwarmContext). This allows you to do anything with results you may have computed during the 'job_process' call.

job_onCancel : This is a methd which is called when your job was cancelled. This method is called from the main thread.

To manage the job swarm you use the following methods on the JobSwarmContext.

createSwarmJob : This creates a single job to be processed.

cancel : This cancels a previously posted job. It is very important to note that this only marks the job to be cancelled! You cannot destruct the job until you recieve the 'job_onCancel' callback.

processSwarmJobs : This is a 'pump' loop method which is called from the main thread. This call allows completed and/or cancelled jobs to be despooled. This is a very lightweight call but if you forget to invoke it none of your job completion events will ever occur.

setUseThreads : By default the job swarm system runs using hardware threads. However, for debugging and/or performance comparison tests, it can be toggled on and off on the fly. Setting 'useThreads' to false does not close out the hardware threads themselves, it simply moves the 'work' of the individual jobs into the main thread when you call the 'processSwarmJobs' pump method.

I am releasing this JobSwarm code for a reason. First, I hope other people might adopt it and improve the code and I would, thereby, benefit from it. I was also frustrated in my efforts to find a very tiny and simple micro-threading solution by searching on the net.

This system could use some improvements. For example, in addition to needing a true lock-free FIFO, it also might be nice if it supported job priorities.

It also might be interesting to support inter-job communication, if that sort of thing made sense and/or was useful.

I am making this code available in two forms.

First, as a small console application with just a tiny amount of source code.

You can download it from JobSwarm.zip

The source included with 'JobSwarm.zip' is as follows:

fractal.cpp and fractal.h : A very simple fractal routine which can solve the fractal in a linear fashion or by breaking it up into 65,536 distinct 'jobs'.

JobSwarm.cpp and JobSwarm.h : The main JobSwarm system. Has no OS dependent code in it.

LockFreeQ.cpp and LockFreeQ.h : Provides a lock-free circular queue for a single provider / single consumer system. Also provides an API for a lock free FIFO, however the implementation currently does use locks.

main.cpp : The console demo/test app.

pool.h : A double linked list template class used by the JobSwarm to manage jobs. Minimizes memory allocations.

sgif.cpp and sgif.h : A snippet of code to save a block of memory as a GIF image.

ThreadConfig.cpp and ThreadConfig.h : A wrapper layer to hide OS dependencies for threading related calls. Intended to be multi-platform but the current implementation is only for Windows.

The second application is a larger framework called 'ThreadFrac', which provides an interactive graphics application which allows you to visually see the performance differences when toggling the job swarm on and off on multi-core machines.

You can download and install it from ThreadFrac.exe

I look forward to bug-fixes, improvements, and feedback from the development community.

Thanks,

John

--------------------- Revision update January 3, 2009 --------------

I made the following important revisions to the JobSwarm.zip download.

  • I removed the source files 'fractal.cpp' and 'fractal.h'
  • I revised the 'main.cpp' to act as a much simpler example program on how to use the JobSwarm system. Simply look through and/or step through the code that is there.
  • I created two separate examples of how to use the JobSwarm system.
  • First : One class with a callback interface is created for each separate job. This may be the most common use case but requires more setup and memory allocation in some use cases.
  • Second: A single class receives callbacks for all of the 'jobs' and it uses the user data and user id field to figure out just what needs to be calculated in that context.
  • I added support for a userData and userId field for jobs. This extra meta data is available as an optional additional information to pass into the job service callbacks.
  • I fixed a crash bug that would happen occasionally when trying to deconstruct the JobSwarmContext
  • This version was built with VC2008. If you have an earlier version of VC you won't be able to load the solution and project files. Simply create a new console app and add the source files, that is all you need to build the example program.



Thanks to everyone who previously downloaded the sample and gave me feedback. It was all very useful.

17 Comments:

  • At 2:05 PM, Anonymous Anonymous said…

    It seems very interesting. I'll follow the subject and give my feelings :)

     
  • At 4:28 PM, Blogger Vince said…

    I was about to sit down and write something like this, so this was very serendipitous. Saves me some time. Thanks! I'll give more feedback when I've had time to play around with it.

     
  • At 6:11 AM, Blogger _jj said…

    Thanks for this, looks very interesting! However, I think there's a bug: JobSwarm.exe crashes sometimes on my machine. I compiled a debug-version and found the crashing line:
    JOB_SWARM::SwarmJob::isCancelled() Line 99 + 0x26 bytesC++
    JOB_SWARM::ThreadWorker::threadMain() Line 357 + 0xb bytesC++
    THREAD_CONFIG::MyThread::onJobExecute() Line 199 + 0x14 bytesC++
    THREAD_CONFIG::_ThreadWorkerFunc(void * arg=0x003b2be8) Line 224C++

    Looks like "mSwarmJob" is not initialized? Crash happens when "computing with one core" has already started. Other worker threads are all waiting at: THREAD_CONFIG::MyThreadEvent::waitForSingleObject(unsigned int ms=10) Line 256 + 0x13 bytes C++
    JOB_SWARM::ThreadWorker::threadMain() Line 382 + 0x19 bytesC++
    THREAD_CONFIG::MyThread::onJobExecute() Line 199 + 0x14 bytesC++
    THREAD_CONFIG::_ThreadWorkerFunc(void * arg=0x01b669d0) Line 224C++

    I don't have more time to dig into the problem right now :(. I hope someone can provide a fix sooner...

     
  • At 3:46 PM, Blogger John said…

    I've made bug fixes, see the latest post for info.

     
  • At 10:39 AM, Blogger _jj said…

    Thanks, does not crash anymore. I took a deeper look at this version and I'm curious about one thing:
    ::CreateEventA(NULL,TRUE,TRUE,"ThreadEvent");
    Doesn't this actually point to a single event for all threads? Even those in different JobSwarmContext? Makes me wonder if the name should be unique or NULL...

     
  • At 10:31 PM, Blogger Kim, Hyoun Woo said…

    Interesing. BTW where do you plan to use this? Some of deveoloper I know used a similiar thing - some kind of micro thereading system to NPC A.I. for his game. As he said, it worked fine but hard to debug when there was any problems are.

    Just wonder, what's your plan.

    And happy new year - I really enjoy your blog - nice blog. :-)

     
  • At 12:22 AM, Anonymous Jamie said…

    Really good article, had a read over Ladan-Mozes and Shavit's paper and it seems an interesting approach. Haven't quite been able to get it work in C# however. Would you be willing to distribute the C source if she contacts you?

    Regards,
    Jamie

     
  • At 8:59 AM, Blogger John said…

    Please read the latest posting with updated information about this little project.

     
  • At 9:26 PM, Blogger asynclib said…

    Hello guys.

    You will probably be interested to check this out:

    http://groups.google.ru/group/asynclib?hl=ru

     
  • At 10:55 PM, Blogger John said…

    Asynclib,

    Looks like a nice package but, unfortunately, it doesn't come with source and the license is not usable.

    John

     
  • At 11:22 PM, Blogger asynclib said…

    >and the license is not usable

    What do you mean by that ?

     
  • At 6:08 AM, Blogger John said…

    On the site it says there is a separate commercial license. JobSwarm is open source and free to use in commercian and non-commercial software and does not have a 'viral-license' like GPL.

    John

     
  • At 7:02 AM, Anonymous Anonymous said…

    WRT: C#. The follow-up paper has a Java impl, work from that.

     
  • At 12:05 PM, Blogger pachanga said…

    I'm just wondering - have you seen Intel Thread Building Blocks?

     
  • At 3:35 PM, Blogger John said…

    Yes, the Intel Threading library is quite awesome. Unfortunately it is released under GPL which I cannot use.

    John

     
  • At 5:19 PM, Anonymous Andrew Phillips said…

    Hi John,

    I'm a long time lurker, first time writer.

    1st. Thanks for the suppository, Its hugely useful.

    2nd. The lock free queue uses a trick of exchange to get the value of tail (LockFreeQ.cpp:105).
    Is Tail not read as it enters the function?
    Or is it because its going into an atomic function that the reads will be in the specified order?
    (I'm guessing the cmpxchg8b will act like a read/write barrier for both the compiler and cpu.)

     
  • At 2:48 AM, OpenID alexuol said…

    Hello,
    (Sorry for some gravedigging)

    JobSwarm looks very similar to thread pool pattern for me. Am I correct?

    If yes, I hope You wouldnt mind that I posted next link: http://threadpool.sourceforge.net/

    It is thread pool implementation for c++. In a very boost style.

     

Post a Comment

<< Home