I recently discovered the following code below to effectively run lots of I/O bound tasks:
Implementing a simple ForEachAsync, part 2
I'm under the impression the following are true:
- This is much better than using
Parallel.ForEach
because the work is not CPU bound. ForEachAsync
will help in queueing as many IO tasks as possible (without necessarily putting these on separate threads).- The TPL will 'know' these are IO based tasks and not spin up more threads, instead using callbacks/task completion source to signal back to the main thread, thus saving overhead of thread context switching.
My question is, as Parallel.ForEach
intrinsically has its own MaxDegreeOfParallelism
defined how do I know what to define the dop parameter to here in the example code of the IEnumerable
extension?
e.g. If I have 1000 items to process and need to carry out an IO based SQL-Server db call for each item, would I specify 1000 as the dop? With Parallel.ForEach
it is used as a limiter to prevent too many threads spinning up which might hurt performance. But here it seems to be used to partition up the minimum number of async tasks. I'm thinking there should be at least no maximum as such (the minimum being the total items to process) because I want to queue as many IO based calls to the database as possible.
How do I go about knowing what to see the DOP parameter too?
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}