4

Right now, I've got a C# program that performs the following steps on a recurring basis:

  • Grab current list of tasks from the database
  • Using Parallel.ForEach(), do work for each task

However, some of these tasks are very long-running. This delays the processing of other pending tasks because we only look for new ones at the start of the program.

Now, I know that modifying the collection being iterated over isn't possible (right?), but is there some equivalent functionality in the C# Parallel framework that would allow me to add work to the list while also processing items in the list?

Eric
  • 43
  • 4
  • Do you use a timer to grab tasks from the database? If so, add a status column to the task table and mark the task as "processing". Once the task is completed, mark it as "complete". So within your timer you should only grab the tasks, that have "null" status. Regarding the tasks list, it should be declared within your "ProcessTasks" method. Which means every time the timer runs, you get a new list to work with. – Kosala W Nov 11 '15 at 20:55
  • Forgive me, but I don't quite see how that solves the issue. We do indeed grab tasks on a timer and have the Processing and Complete columns. The issue is that once we've grabbed the list of tasks and start processing them with Parallel.ForEach, we're stuck inside that ForEach until all the processing is complete. I suppose we could throw the Parallel processing into a Task() so that the main thread could continue its timer. – Eric Nov 11 '15 at 21:00
  • Parallel for each should spawn a thread for each task. Which means if you have 5 tasks, they should get processed independently and you should comeout of the loop as soon as those 5 threads are started right? Have you looked at thread safe [collections](https://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx)?. It may help you. – Kosala W Nov 11 '15 at 21:07
  • Are you positive that the ForEach returns once it has started each of its threads? I could swear that it blocks until all of its threads are done. – Eric Nov 11 '15 at 21:12
  • @KosalaW `Parallel.ForEach` will not complete until it has enumerated every element or it has been interrupted. It is much like a traditional `foreach` loop but *may* execute in parallel. https://msdn.microsoft.com/en-us/library/dd992001%28v=vs.110%29.aspx – TheInnerLight Nov 11 '15 at 21:21
  • @TheInnerLight : You are correct. I just had a look at one of my codes. This is what I have done. Task.Factory.StartNew(() => { Parallel.ForEach(tasks, t => ProcessTasks); }); So the thread does not get blocked until all Parellel.Foreach is complete. – Kosala W Nov 11 '15 at 21:32

2 Answers2

2

Here is an example of an approach you could try. I think you want to get away from Parallel.ForEaching and do something with asynchronous programming instead because you need to retrieve results as they finish, rather than in discrete chunks that could conceivably contain both long running tasks and tasks that finish very quickly.

This approach uses a simple sequential loop to retrieve results from a list of asynchronous tasks. In this case, you should be safe to use a simple non-thread safe mutable list because all of the mutation of the list happens sequentially in the same thread.

Note that this approach uses Task.WhenAny in a loop which isn't very efficient for large task lists and you should consider an alternative approach in that case. (See this blog: http://blogs.msdn.com/b/pfxteam/archive/2012/08/02/processing-tasks-as-they-complete.aspx)

This example is based on: https://msdn.microsoft.com/en-GB/library/jj155756.aspx

private async Task<ProcessResult> processTask(ProcessTask task) 
{
    // do something intensive with data
}

private IEnumerable<ProcessTask> GetOutstandingTasks() 
{
    // retreive some tasks from db
}

private void ProcessAllData()
{
    List<Task<ProcessResult>> taskQueue = 
        GetOutstandingTasks()
        .Select(tsk => processTask(tsk))
        .ToList(); // grab initial task queue

    while(taskQueue.Any()) // iterate while tasks need completing
    {
        Task<ProcessResult> firstFinishedTask = await Task.WhenAny(taskQueue); // get first to finish
        taskQueue.Remove(firstFinishedTask); // remove the one that finished
        ProcessResult result = await firstFinishedTask; // get the result
        // do something with task result
        taskQueue.AddRange(GetOutstandingTasks().Select(tsk => processData(tsk))) // add more tasks that need performing
    }
}
TheInnerLight
  • 12,034
  • 1
  • 29
  • 52
  • Your answer is much appreciated, but I don't think it gets at the core problem I'm having. I don't want to process results that are done earlier sooner (indeed, there's no post-processing to be done). Rather, I'm looking to process a list of tasks in parallel and, while that processing is running, add more tasks to the list (to also be processed in parallel). The await Task.WhenAny call will block until the first task is done. If that task is long-running, then it will be a long time until we do a GetOutstandingTasks() call, exactly the problem I'm having now. – Eric Nov 11 '15 at 22:48
  • @Eric `Task.WhenAny` won't block until the first task is done, it will block until *any* task in the list is done. That means you can complete and deal with lots of fast running tasks while waiting for the results of long running tasks. Long running tasks won't feature until they complete. – TheInnerLight Nov 11 '15 at 22:53
  • @Eric I updated my answer, hopefully it is now worded more clearly. – TheInnerLight Nov 11 '15 at 22:58
  • So, you query for new tasks only when a task completes? That doesn't sound like a good approach to me, when the questions explicitly states that some tasks can take a very long time. For example, if the there is only one such task currently being processed, your code would wait a long time before getting new tasks. – svick Nov 12 '15 at 11:01
  • @svick That's exactly my issue. A few long-running tasks are holding up us handling some potentially short-running ones. – Eric Nov 12 '15 at 14:16
2

Generally speaking, you're right that modifying a collection while iterating it is not allowed. But there are other approaches you could be using:

  • Use ActionBlock<T> from TPL Dataflow. The code could look something like:

    var actionBlock = new ActionBlock<MyTask>(
        task => DoWorkForTask(task),
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
    
    while (true)
    {
        var tasks = GrabCurrentListOfTasks();
        foreach (var task in tasks)
        {
            actionBlock.Post(task);
    
            await Task.Delay(someShortDelay);
            // or use Thread.Sleep() if you don't want to use async
        }
    }
    
  • Use BlockingCollection<T>, which can be modified while consuming items from it, along with GetConsumingParititioner() from ParallelExtensionsExtras to make it work with Parallel.ForEach():

    var collection = new BlockingCollection<MyTask>();
    
    Task.Run(async () =>
    {
        while (true)
        {
            var tasks = GrabCurrentListOfTasks();
            foreach (var task in tasks)
            {
                collection.Add(task);
    
                await Task.Delay(someShortDelay);
            }
        }
    });
    
    Parallel.ForEach(collection.GetConsumingPartitioner(), task => DoWorkForTask(task));
    
svick
  • 236,525
  • 50
  • 385
  • 514