3

I have some c# code (MVC WebAPI) which iterates over an array of IDs in parallel and makes an API call for each entry. In the first version, the whole code was a simple, synchronous for loop. Now we changed that to a combination of Task.WhenAll and a LINQ select:

private async Task RunHeavyLoad(IProgress<float> progress) {
  List<MyObj> myElements = new List<MyObj>(someEntries);
  float totalSteps = 1f / myElements.Count();
  int currentStep = 0;

  await Task.WhenAll(myElements.Select(async elem => {
    var result = await SomeHeavyApiCall(elem);
    DoSomethingWithThe(result);
    progress.Report(totalSteps * System.Threading.Interlocked.Increment(ref currentStep) * .1f);
  }

  // Do some more stuff
}

This is a simplified version of the original method! The actual method EnforceImmediateImport is called by this SignalR hub method:

public class ImportStatusHub : Hub {
  public async Task RunUnscheduledImportAsync(DateTime? fromDate, DateTime? toDate) {
    Clients.Others.enableManualImport(false);

    try {
      Progress<float> progress = new Progress<float>((p) => Clients.All.updateProgress(p));
      await MvcApplication.GlobalScheduler.EnforceImmediateImport(progress, fromDate, toDate);

    } catch (Exception ex) {
      Clients.All.importError(ex.Message);
    }

    Clients.Others.enableManualImport(true);
  }
}

Now I wonder, if this is "thread safe" per se, or if I need to do something with the progress.Report calls to prevent anything from going wrong.

André Reichelt
  • 1,484
  • 18
  • 52
  • Does this answer your question? [IProgress synchronization](https://stackoverflow.com/questions/17982555/iprogresst-synchronization) – Sebastian L Jan 21 '20 at 10:11
  • 1
    @SebastianL As far as I understand that post, the problem described there is exclusive to console applications. In my case, it's an MVC WebAPI project. – André Reichelt Jan 21 '20 at 10:15
  • The thread-safery of this code depends on the implementation of the `IProgress` interface. The default implementation (class [`Progress`](https://learn.microsoft.com/en-us/dotnet/api/system.progress-1)) is thread-safe. You have to show us how you call the `RunHeavyLoad` method, what is the actual type of the `progress` argument, and also show its associated delegate (if any). – Theodor Zoulias Jan 21 '20 at 12:41
  • @TheodorZoulias I've added some actual code of my project to make the context a little bit clearer. – André Reichelt Jan 21 '20 at 12:59
  • The thread safety now depends on the implementation of the `Clients.All.updateProgress` method. This method may be invoked concurrently by multiple threads. Unless there is a `SynchronizatioContext` installed by the web framework, that serializes the invocations. Could you debug-print the property [`SynchronizationContext.Current`](https://learn.microsoft.com/en-us/dotnet/api/system.threading.synchronizationcontext.current) to see if it's `null` or not? – Theodor Zoulias Jan 21 '20 at 13:11
  • @TheodorZoulias I actually have no idea, how the method is implemented.`Client.All` is a SignalR library property. It's basically a blank. You can put whatever method behind it and it will call it on the client side. https://learn.microsoft.com/de-de/aspnet/signalr/overview/guide-to-the-api/hubs-api-guide-net-client – André Reichelt Jan 21 '20 at 14:30
  • Then you should probably look at the documentation of this library. Hopefully it will be mentioned if the classes/methods are thread-safe or not. If it's not mentioned, and you want to be on the safe side, you could just add your own synchronization. For example by using a [`lock`](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/lock-statement). – Theodor Zoulias Jan 21 '20 at 16:34

2 Answers2

2

From the docs:

Remarks

Any handler provided to the constructor or event handlers registered with the ProgressChanged event are invoked through a SynchronizationContext instance captured when the instance is constructed. If there is no current SynchronizationContext at the time of construction, the callbacks will be invoked on the ThreadPool.

For more information and a code example, see the article Async in 4.5: Enabling Progress and Cancellation in Async APIs in the .NET Framework blog.

Like anything else using the SynchronizationContext, it's safe to post from multiple threads.

Custom implementations of IProgress<T> should have their behavior defined.

Paulo Morgado
  • 14,111
  • 3
  • 31
  • 59
-1

On your question, internally, Progress only does invoking. It is up to the code you wrote to handle the progress on the other side. I would say that the line progress.Report(totalSteps * System.Threading.Interlocked.Increment(ref currentStep) * .1f); can cause a potential progress reporting issue due to the multiplication which is not atomic.

This is what happens internally inside Progress when you call Report

  protected virtual void OnReport(T value)
  {
      // If there's no handler, don't bother going through the sync context.
      // Inside the callback, we'll need to check again, in case 
      // an event handler is removed between now and then.
      Action<T> handler = m_handler;
      EventHandler<T> changedEvent = ProgressChanged;
      if (handler != null || changedEvent != null)
      {
          // Post the processing to the sync context.
          // (If T is a value type, it will get boxed here.)
          m_synchronizationContext.Post(m_invokeHandlers, value);
      }
  }

On the code though, a better way to run in parallel is to use PLinq. In your current code, if the list contains many items, it will spin up tasks for every single item at the same time and wait for all of them to complete. However, in PLinq, the number of concurrent executions will be determined for you to optimize performance.

myElements.AsParallel().ForAll(async elem =>
{
    var result = await SomeHeavyApiCall(elem);
    DoSomethingWithThe(result);
    progress.Report(totalSteps * System.Threading.Interlocked.Increment(ref currentStep) * .1f);
}

Please keep in mind that AsParallel().ForAll() will immediately return when using async func. So you might want to capture all the tasks and wait for them before you proceed.

One last thing, if your list is being edited while it is being processed, i recommend using ConcurrentQueue or ConcurrentDictionary or ConcurrentBag.

Adel Alzubeir
  • 93
  • 1
  • 3
  • Thank you for your ideas. First of all, the list of elements is fixed and won't change during execution. Right now, I have 62 elements in my list and I'm making a WCF call for each of them. From the received objects, I add only a single integer value to a `ConcurrentHashSet`. The progress result is pushed to various web clients via SignalR. – André Reichelt Jan 21 '20 at 11:23
  • You should be fine i believe, since the only issue that might arise is order, which you don't care about anyway. **but why are you dividing by 10?** wouldn't it be better to calculate on the event? If the result is larger than the previous, you update clients via SginalR, if not, you just ignore it. – Adel Alzubeir Jan 21 '20 at 11:43
  • Because the code above is just the first part of a longer data gathering operation. The remaining 90% will be provided by more code below. – André Reichelt Jan 21 '20 at 12:17
  • By the way: Should I use a `lock` statement around the `progress.Report` method, or would that be redundant? – André Reichelt Jan 21 '20 at 12:18
  • PLinq is intended for parallelizing CPU-bound operations. The operations in OP's code are probably I/O-bound (I assume it because they are asynchronous). – Theodor Zoulias Jan 21 '20 at 12:32
  • @TheodorZoulias The operations aren't very CPU heavy - at least on my end of the API. It's a method which returns a lot of data after a few seconds, though. The calls take around 1000 ms to receive the answer and 2 ms to iterate over the result. I only iterate over the first hirarchy of the result and put integer values into a `ConcurrentHashSet` Would you suggest to stay with the current solution (`Task.WhenAll`), or should I change it to PLinq? Or is there even a more elegant solution, maybe with functional programming? – André Reichelt Jan 21 '20 at 12:40
  • @AndréReichelt `Task.WhenAll` is OK. Unless you have a need for [throttling the I/O operations](https://stackoverflow.com/questions/10806951/how-to-limit-the-amount-of-concurrent-async-i-o-operations) (assuming that you are calling some Web API, and don't want to bombard the remote server with a million requests at once). – Theodor Zoulias Jan 21 '20 at 12:47
  • @TheodorZoulias PLinq is used to achieve a decent amount of parallelism regardless of bound operation. The main concern is not the target server responsiveness, it is how many context switching the CPU running the above code needs to do which ultimately wastes CPU cycles and CPU time and can be very inefficient. Sometimes, PLinq doesn't run all in parallel even if you ask it to because running in serial would achieve better results. – Adel Alzubeir Jan 21 '20 at 22:32
  • @AndréReichelt The only issue that might arise from what you have provided is the order of report progress. Lock is not required, it will only slow things down. Just keep in mind that the progress reporting might be out of order. – Adel Alzubeir Jan 21 '20 at 22:34
  • My advice is to forget PLinq and the `Parallel` class when you have asynchronous I/O-bound operations that you want to execute concurrently. It is sub-optimal to say the least, because it needlessly blocks `ThreadPool` threads among other reasons. The correct way to throttle asynchronous operations is to use asynchronous throttlers like the `SemaphoreSlim` (example [here](https://stackoverflow.com/questions/59843575/replace-task-whenall-with-plinq/59843871#59843871)), or async-aware libraries like the TPL Dataflow. – Theodor Zoulias Jan 21 '20 at 23:15
  • I digress, since the OP Is using `await SomeHeavyApiCall(elem);` the call is already in a task which will not block the thread! This is true if the method SomeHeavyApiCall() was not async Task. – Adel Alzubeir Jan 22 '20 at 07:09
  • @TheodorZoulias So the approach posted there is ok, even though it can produce quite a big amount of tasks? – André Reichelt Jan 22 '20 at 08:57
  • @AndréReichelt creating many tasks is not a problem. You could create a million tasks at a fraction of a second. – Theodor Zoulias Jan 22 '20 at 09:19