2

I'm currently trying to improve my understanding of Multithreading and the TPL in particular. A lot of the constructs make complete sense and I can see how they improve scalability / execution speed.

I know that for asynchronous calls that don't tie up a thread (like I/O bound calls), Task.WhenAll would be the perfect fit. One thing I am wondering about, though, is the best practice for making CPU-bound work that I want to run in parallel asynchronous.

To make code run in parallel the obvious choice would be the Parallel class. As an example, say I have an array of data I want to perform some number crunching on:

string[] arr = { "SomeData", "SomeMoreData", "SomeOtherData" };
Parallel.ForEach(arr, (s) =>
{
    SomeReallyLongRunningMethod(s);
});

This would run in parallel (if the analyser decides that parallel is faster than synchronous), but it would also block the thread.

Now the first thing that came to my mind was simply wrapping it all in Task.Run() ala:

string[] arr = { "SomeData", "SomeMoreData", "SomeOtherData" };
await Task.Run(() => Parallel.ForEach(arr, (s) =>
{
    SomeReallyLongRunningMethod(s);
}));

Another option would be to either have a seperate Task returing method or inline it and use Task.WhenAll like so:

static async Task SomeReallyLongRunningMethodAsync(string s)
{
    await Task.Run(() =>
    {
        //work...
    });
}
// ...
await Task.WhenAll(arr.Select(s => SomeReallyLongRunningMethodAsync(s)));

The way I understand it is that option 1 creates a whole Task that will, for the life of it, tie up a thread to just sit there and wait until the Parallel.ForEach finishes. Option 2 uses Task.WhenAll (for which I don't know whether it ties up a thread or not) to await all Tasks, but the Tasks had to be created manually. Some of my resources (expecially MS ExamRef 70-483) have explicitly advised against manually creating Tasks for CPU-bound work as the Parallel class is supposed to be used for it.

Now I'm left wondering about the best performing version / best practice for the problem of wanting parallel execution that can be awaited. I hope some more experienced programmer can shed some light on this for me!

Jejuni
  • 1,034
  • 1
  • 8
  • 13
  • 2
    might be easier to answer if you specify what you wand to avoid blocking on the main thread. It might depend if the main thread is UI that you want to keep responsive, or something else. – Slai Oct 22 '17 at 18:22
  • The question is really too broad. There are too many different factors that might influence a design choice here. That said, either approach is likely to be "fine", in most scenarios. The main advantage to `Parallel` is that it can take advantage of knowledge of the size of the workload to partition that workload among threads for best performance, while the `Task`/`Select()` approach is going to follow the naïve thread pool rules (i.e. create a new thread if work items are waiting too long). For modest workloads, you'll likely not notice a significant difference. – Peter Duniho Oct 23 '17 at 00:24
  • If I/O is involved, you usually do not want to blindly launch N parallel tasks. You want to limit it. TPL dataflow is the best solution I've found for this. – Cory Nelson Oct 23 '17 at 03:36

2 Answers2

0

Option 1 is the way to go as the thread from thread pool being used for the task will also get used in parallel for loop. Similar question answered here.

vaibhav kumar
  • 885
  • 1
  • 11
  • 13
  • Thanks, that clears that up! Simply wrapping something in a Task.Run() seemed like the "cheap way out", but if the thread is being used for inside the loopit seems well worth it. – Jejuni Oct 23 '17 at 21:06
0

You really should use Microsoft's Reactive Framework for this. It's the perfect solution. You can do this:

string[] arr = { "SomeData", "SomeMoreData", "SomeOtherData" };

var query =
    from s in arr.ToObservable()
    from r in Observable.Start(() => SomeReallyLongRunningMethod(s))
    select new { s, r };

IDisposable subscription =
    query
        .Subscribe(x =>
        {
            /* Do something with each `x.s` and `x.r` */
            /* Values arrive as soon as they are computed */
        }, () =>
        {
            /* All Done Now */
        });

This assuming that the signature of SomeReallyLongRunningMethod is int SomeReallyLongRunningMethod(string input), but it is easy to cope with something else.

It's all run on multi-threads in parallel.

If you need to marshal back to the UI thread you can do that with an .ObserveOn just prior to the .Subscribe call.

If you want to stop the computation early you can call subscription.Dispose().

Enigmativity
  • 113,464
  • 11
  • 89
  • 172