Process a list of items using multiple, limited number of threads

Question

Basically, I want to process a list of items in multiple threads instead of one at a time. I only want a limited number of threads going at a time. Does this approach make sense? Is using a global variable for the thread count the only option? (pseudo-code below)

foreach item in list
    while thread_count >= thread_max
        sleep
    loop
    start_thread item
    thread_count++
next

function start_thread(item)
    do_something_to item
    thread_count--
end function

Why would you want to control the number of threads? let the framework decide optimally by using [Threadpool](http://msdn.microsoft.com/en-us/library/system.threading.threadpool(v=VS.100).aspx). — Sanjeevakumar Hiremath, Mar 04 '11 at 08:18
So I would be able to run through the list and use QueueUserWorkItem on each item without worrying about how many threads are running? The pool would manage it for me? Interesting... — Ed Manet, Mar 04 '11 at 18:56

Drew Marsh · Answer 1 · 2011-03-03T19:03:48.817

~~I would use PLINQ for this and specify a max degree of parallelism like so:~~

I'm actually changing my answer on this one because I realized you just want to process a raw list directly and you're not doing any other filtering or mapping (Where/Select). In this particular case it would be better to use Parallel::ForEach and specify the MaxDegreeOfParallelism via ParallelOptions like so:

 int myMaxDegreeOfParallelism = 4; // read this from config maybe

 Parallel.ForEach(
    list,
    new ParallelOptions
    {
        MaxDegreeOfParallelism = myMaxDegreeOfParallelism
    }
    item =>
    {
        // ... your work here ...
    });

Now, keep in mind, when you specify a max like this you prevent PLINQ from being able to use any more resources even if they're availabe. So if this ran on an 8 core machine, it would never utilize more than 4 cores. Conversely, just because you specified 4, doesn't mean 4 are guaranteed to execute simultaneously at any given time. It all depends on several heuristics that the TPL is using to be optimal.

score 1 · Answer 2 · answered Mar 02 '11 at 22:08

It makes sense, but I hope you're aware that this isn't the usual way to do it unless you have very specific performance reasons or are stuck on .NET 3.5. Normally you would use Parallel.ForEach over the elements in the list, and rely on the partitioner to divide up the work into appropriate chunks.

Even if you didn't have the TPL, it would be more idiomatic to divide up all the work and hand each thread a big chunk of work at once, rather than doling it out piecemeal at the moment a thread finishes. The only reason to do it your way is if you expected the amount of time a given work item takes to be more or less unpredictable, so you couldn't divide up the work well in advance.

(Also, you could just keep references to the threads and check how many are still working and how many are completed. That would do away with the variable.)

Process a list of items using multiple, limited number of threads

2 Answers2