0

Is there a good way to pass thread local data into an ActionBlock, such that if you specify MaxDegreeOfParallelization in its DataFlowExecutionOptions to be > 1, then each task that executes the action will have its own thread local data?

Here is some of my code that will perhaps clarify what I want to do:

var options = new ExecutionDataflowBlockOptions() 
     {
        MaxDegreeOfParallelism = 12
     };

ActionBlock<int> actionBlock = new ActionBlock<int>(PerformAction, options);

List<int> resultsList = new List<int>();

void PerformAction(int i)
{
    // do some work

    // add them to resultsList 

    // i want to make sure that each thread that executes this method has its 
    // own copy of resultsList 
}

I want to be able to have the ActionBlock call a thread local init function that I supply. Something like this:

new ActionBlock<int>(PerformAction, options, () => new List<int>()); 

And have it pass my thread local data into my Action function:

void PerformAction(int i, List<int> localUserData) {...}
svick
  • 236,525
  • 50
  • 385
  • 514
dmg
  • 608
  • 8
  • 15
  • Could you explain in more detail what do you need and why? – svick Feb 04 '13 at 19:26
  • 1
    But why do you want to do that? How are you going to use `resultList`s? Why do they have to be thread-local? – svick Feb 04 '13 at 22:13
  • they have to be thread local for optimization purposes because i don't want to new up this object for every invocation of the Action function. It's a micro optimization that is necessitated based on profiling and looking at GC behavior during application run-time – dmg Feb 04 '13 at 22:49
  • I know there are several other ways of implementing this but I was looking for a clean way within the confines of the TPL DataFlow API. Obviously it looks like their API is lacking this feature – dmg Feb 04 '13 at 22:50
  • Consider whether my answer to this question http://stackoverflow.com/questions/15265761/tpl-dataflow-local-storage-or-something-like-it/15286413#15286413 applies to your case. If it does, it's a safer approach IMO than the answer posted below. – Andrew Arnott Mar 08 '13 at 03:57

1 Answers1

2

I still don't understand why do you need thread-local list in a dataflow block. And you're right that TDF doesn't have any explicit support for thread-local values (the way Parallel.ForEach() does). But that doesn't mean you can't use thread-local values, you'll just have to do everything manually, using ThreadLocal (I think [ThreadStatic] wouldn't work well here, because it doesn't allow you to track all thread-local instances). For example:

private static ThreadLocal<List<int>> threadLocalList;

private static void Main()
{
    threadLocalList = new ThreadLocal<List<int>>(() => new List<int>(), true);

    var block = new ActionBlock<int>(
        (Action<int>)PerformAction,
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });

    for (int i = 0; i < 10; i++)
        block.Post(i);

    block.Complete();
    block.Completion.Wait();

    foreach (var list in threadLocalList.Values)
        Console.WriteLine(string.Join(", ", list));

    threadLocalList.Dispose();
}

private static void PerformAction(int i)
{
    threadLocalList.Value.Add(i * i);
}
svick
  • 236,525
  • 50
  • 385
  • 514
  • very cool. I was unaware of the existence of this ThreadLocal<> class. It does exactly what I need actually. – dmg Feb 06 '13 at 15:02
  • Be very careful. Dataflow does not guarantee that individual parallel tasks are executed on different threads, or that a single task executes on only one thread. If your delegate is synchronous and you know how the TaskScheduler that you're using works, then you could probably get away with this technique, but it seems pretty hazardous though because someone could slightly change something in the code and break down your assumptions. – Andrew Arnott Mar 08 '13 at 03:56
  • @AndrewArnott Yeah, I was assuming synchronous delegates. But I'm not sure how could different task scheduler break this code. – svick Mar 08 '13 at 08:25
  • @svick A different TaskScheduler may execute all tasks on one thread. For synchronous delegates that's probably OK with this code. For async delegates it means suddenly the block's async delegate is sharing state with all the other "parallel" block executions, which defeats what dmg was asking for, which in fact can happen even on the threadpool TaskScheduler with async delegates. – Andrew Arnott Mar 09 '13 at 14:53