-1

I need to run many (10-100k) convertation operations in parallel. In this case I use tasks, SymaphoreSlim to control degree of parallelism and CancellationTokenSource to prevent not running tasks to run when exception is thrown. I'm trying to do something like this:

        var tasks = table.Body.Select(async row =>
        {
            await _semaphore.WaitAsync(_cts.Token).ConfigureAwait(false);
            try
            {
                await Convert(row, preparedData, table, mapper).ConfigureAwait(false);
            }
            catch (Exception)
            {

                _cts.Cancel();
                throw;
            }
        });

        await Task.WhenAll(tasks).ConfigureAwait(false);

But performance is awful. If I get rid of using tokens it takes just a few seconds.

        var tasks = table.Body.Select(async row =>
        {
            if (_canceled)
                return;

            await _semaphore.WaitAsync().ConfigureAwait(false);
            try
            {
                await Convert(row, preparedData, table, mapper).ConfigureAwait(false);
            }
            catch (Exception)
            {
                _canceled = true;
                throw;
            }
        });

        await Task.WhenAll(tasks).ConfigureAwait(false);

Is there a way to use CancellationTokens with a regular performance?

Andrey Alonzov
  • 350
  • 2
  • 9
  • How common is it for `Convert` to throw an exception? If there are no exceptions thrown, I'd have expected these two to have similar performance. After all, `WaitAsync()` just calls `WaitAsync(Timeout.Infinite, default(CancellationToken))`. `WaitAsync` also has a shortcut for it the `CancellationToken` has already been canceled. – canton7 Mar 26 '19 at 12:45
  • What data type is `table`? Can you show us what `Convert` does? – mjwills Mar 26 '19 at 12:46
  • Also, in your second example, `_canceled` is checked before the semaphore is acquired, at the very top of the method. Therefore I'd expect the `Select` invocation for each row to check `_canceled` before any of them has had a chance to either acquire the lock of call `Convert`, and so I don't think cancellation can possibly work here, as you've described. – canton7 Mar 26 '19 at 12:46
  • @mjwills, does the type of table matter? It is a custom type containing Body - `List>`. Convert does some convertation logic, throws custom exception on error and releases the symaphore. – Andrey Alonzov Mar 26 '19 at 12:52
  • Where are you checking cancellation token? Once started task won't be stopped implicitly, you need to do that manually... – Johnny Mar 26 '19 at 12:58
  • @canton7, you are right about checking `_canceled` before the semaphore, my bad. I've placed it after the semaphore and now it seems to be deadlocked – Andrey Alonzov Mar 26 '19 at 12:58
  • @Johnny, i'm not trying to cancel running tasks, just want to perevent another tasks to run. – Andrey Alonzov Mar 26 '19 at 12:59
  • @AndreyAlonzov well yes - I bet you're not releasing the semaphore in that case – canton7 Mar 26 '19 at 12:59
  • @AndreyAlonzov how do you instantiate the semaphore? How many times can it be acquired concurrently? – canton7 Mar 26 '19 at 13:00
  • @canton7, I instantiate int before creating tasks. `_semaphore = new SemaphoreSlim(_parallelOperations);`. Now I added `_semaphore.Release()` into if block where `_canceled` is checked and it works fast and everything is ok except `Task` status is `RanToCompletion` – Andrey Alonzov Mar 26 '19 at 13:04
  • 1
    @AndreyAlonzov what does `Convert` do? Why not use `Parallel.ForEach` or PLINQ which are built specifically for data parallelism? Parallel processing is one thing. Concurrent execution something very different. – Panagiotis Kanavos Mar 26 '19 at 13:05
  • Right, and _parallelOperations is > 1? – canton7 Mar 26 '19 at 13:05
  • @canton7, it is. I've tried from 20 to 500 – Andrey Alonzov Mar 26 '19 at 13:05
  • @PanagiotisKanavos, `Convert` uses another `async` method inside. As far as I know, `Parallel.ForEach` does not fit good in this case – Andrey Alonzov Mar 26 '19 at 13:07
  • @AndreyAlonzov what *do* you want to do? It sounds like you want to throttle conversations, not execute them concurrently or in parallel. You could use an ActionBlock/Dataflow pipeline for this, or Reactive Extensions. The choice depends on what `Convert` actually does. – Panagiotis Kanavos Mar 26 '19 at 13:09
  • @PanagiotisKanavos, i want to run them concurrently but I also want to trottle them to reduce execution time on exception at the very beginning. Convertation logic itself is complex and has many dependencies. – Andrey Alonzov Mar 26 '19 at 13:13
  • @AndreyAlonzov the code you've written would cancel all tasks in case of exception. Is that what you want? In that case a simple `ActionBlock` with a specific DOP and bounded capacity (ie input queue max size) would probably work. The specifics *matter* though - a conversation has many steps, possibly a state machine too. It's definitely not a single call to a method. – Panagiotis Kanavos Mar 26 '19 at 13:16
  • @PanagiotisKanavos, yes, to cancel all the tasks on exception is exactly what I want to do. I also want to get their exception messages then. I was also thinking about Dataflow. If you have an example with Dataflow - please provide it. – Andrey Alonzov Mar 26 '19 at 13:22
  • 1
    @AndreyAlonzov you still haven't posted anything about `Convert`. It *does* matter. You can't cancel running `Convert` methods if you don't pass a cancellation token to them. The problem isn't the Cancellation token. `Convert` itself ignores cancellation and will continue to ignore it no matter what library you use – Panagiotis Kanavos Mar 26 '19 at 13:25
  • @AndreyAlonzov without the tokens the conversations are *not* cancelled. They keep running but your code no longer awaits them. If you had a DOP of 100 and just 1 call threw, `await Task.WhenAll()` would also through but it *wouldn't* cancel the other 99 tasks – Panagiotis Kanavos Mar 26 '19 at 13:26
  • @PanagiotisKanavos, again: I don't need running tasks to be cancelled, just want to prevent new tasks from running. `_semaphore.WaitAsync(_cts.Token)` accepts a token so it will do it (and does, but very slowly) – Andrey Alonzov Mar 26 '19 at 13:29
  • In the fast version, if you replace `if (_canceled) return` with `if (_canceled) throw new OperationCanceledException();`, does it get as slow as the CancellationToken version? – Kevin Gosse Mar 26 '19 at 18:48
  • @KevinGosse, yes, it does. I've already tried it – Andrey Alonzov Mar 27 '19 at 16:05
  • @AndreyAlonzov Then you should try running it without Visual Studio and see if it's still as slow. Running with the debugger attached slows down exception handing **a lot** – Kevin Gosse Mar 27 '19 at 18:33
  • @KevinGosse, I had the same idea but it won't allow me to run an application in debug mode anyway, which is unacceptable. For now I've decided to stay with manual cancellation and returning boolean value for this purpose. – Andrey Alonzov Mar 28 '19 at 14:07

1 Answers1

0

When your code does _semaphore.WaitAsync(_cts.Token), the semaphore has to set up a cancelable wait; this means it has to register with the cancellation token and when it is signaled, remove the wait from the wait queue. So yes, I would expect performance issues with 100k registrations.

If you don't need to cancel the wait, I wouldn't.

await _semaphore.WaitAsync().ConfigureAwait(false);
_cts.Token.ThrowIfCancellationRequested();
Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • Thanks, @Stephen! It's the same even if I try to throw an exception directly. In Release build it take a long time (about 10 times longer than using boolean token) but it is much faster than in Debug. – Andrey Alonzov Apr 03 '19 at 12:19