2

I observed a strange phenomenon that occurs sometimes with an Rx query I wrote, that involves a CancellationToken. Two callbacks are registered to the same CancellationToken, one outside of the query and one that is part of the query. The intention of the CancellationToken is to signal the termination of the query. What happens is that sometimes the second callback is stuck in the middle of the execution, never completing, preventing the first callback from being invoked.

Below is a minimal example that reproduces the issue. It's not very minimal, but I can't reduce it any further. For example replacing the Switch operator with the Merge makes the issue disappear. The same happens if the exception thrown by the Task.Delay(1000, cts.Token) is swallowed.

public class Program
{
    public static void Main()
    {
        var cts = new CancellationTokenSource(500);
        cts.Token.Register(() => Console.WriteLine("### Token Canceled! ###"));
        try
        {
            Observable
                .Timer(TimeSpan.Zero, TimeSpan.FromMilliseconds(1000))
                .TakeUntil(Observable.Create<Unit>(observer =>
                    cts.Token.Register(() =>
                    {
                        Console.WriteLine("Before observer.OnNext");
                        observer.OnNext(Unit.Default);
                        Console.WriteLine("After observer.OnNext");
                    })))
                .Select(_ =>
                {
                    return Observable.StartAsync(async () =>
                    {
                        Console.WriteLine("Action starting");
                        await Task.Delay(1000, cts.Token);
                        return 1;
                    });
                })
                .Switch()
                .Wait();
        }
        catch (Exception ex) { Console.WriteLine("Failed: {0}", ex.Message); }
        Thread.Sleep(500);
        Console.WriteLine("Finished");
    }
}

Expected output:

Action starting
Before observer.OnNext
After observer.OnNext
### Token Canceled! ###
Failed: A task was canceled.
Finished

Actual output (sometimes):

Action starting
Before observer.OnNext
Failed: A task was canceled.
Finished

Try it on fiddle. You may need to run the program 3-4 times before the issue appears. Notice the two missing log entries. It seems that the call observer.OnNext(Unit.Default); never completes.

My question is: Does anyone have any idea what causes this issue? Also, how could I modify the CancellationToken-related part of the query, so that it performs its intended purpose (terminates the query), without interfering with other registered callbacks of the same CancellationToken?

.NET 5.0.1 & .NET Framework 4.8, System.Reactive 5.0.0, C# 9

Update: also .NET 6.0 with System.Reactive 5.0.0 (screenshot taken at June 4, 2022)


One more observation: The issue stops appearing if I modify the Observable.Create delegate so that it returns a Disposable.Empty instead of a CancellationTokenRegistration, like this:

.TakeUntil(Observable.Create<Unit>(observer =>
{
    cts.Token.Register(() =>
    {
        Console.WriteLine("Before observer.OnNext");
        observer.OnNext(default);
        Console.WriteLine("After observer.OnNext");
    });
    return Disposable.Empty;
}))

But I don't think that ignoring the registration returned by the cts.Token.Register is a fix.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • 1
    The problem goes away when you use `FromAsync` rather than `StartAsync`. `FromAsync` is just `Obserable.Defer(() => Observable.StartAsync(...))` so it seems like `StartAsync` by itself is causing some sort of race condition and the `Defer` is needed. This is just a bit of hand waving at the moment though. – Enigmativity Mar 16 '21 at 02:41
  • Hi @Enigmativity. I just tested the `Observable.FromAsync` modification, and I can still observe the issue appearing in my PC, with a similar frequency as `Observable.StartAsync`. But for some reason I can't reproduce it at all on [Fiddle](https://dotnetfiddle.net/Pyr6vg) today! – Theodor Zoulias Mar 16 '21 at 08:42
  • 2
    Interesting! Bloody race conditions. – Enigmativity Mar 16 '21 at 12:03
  • @Enigmativity I think that my best bet is to find a way to invoke the `observer.OnNext(Unit.Default);` asynchronously somehow. – Theodor Zoulias Mar 16 '21 at 12:21
  • Try starting it with the scheduler. – Enigmativity Mar 16 '21 at 22:49
  • Hi Theodor and @Enigmativity, I think there might be a bug/race in `TakeUntil` or maybe I misunderstand something. In my use case, I sporadically see at least one more emitted item, after the cancellation callback has been called (which should dispose of . Here is a fiddle: https://dotnetfiddle.net/bLYDgw, I can't repro it there but on my machine I can. Also can't repro it if I use a "pass-through" implementation of `WithCancellation` (commented out in the fiddle). – noseratio Jun 02 '22 at 07:58
  • My [related question](https://stackoverflow.com/q/72471152/1768303) (I asked it before I discovered this behavior) – noseratio Jun 02 '22 at 08:01
  • @noseratio these are problems that should be looked and fixed by the maintainers of the [Rx repository](https://github.com/dotnet/reactive) IMHO. But no one is listening or giving any useful feedback there. So I am not sure if it's worth it to even report these strange behaviors/race conditions/bugs. – Theodor Zoulias Jun 02 '22 at 08:04
  • @TheodorZoulias I saw a PR from you merged there. Maybe it isn't all lost? :) – noseratio Jun 02 '22 at 08:09
  • 1
    @noseratio ha ha! [This](https://github.com/dotnet/reactive/pull/1717/files) one? I don't think that fixing a typo in the docs counts that much! :-) – Theodor Zoulias Jun 02 '22 at 08:14
  • 1
    Oh I didn't peek inside :) I'll submit this as a bug. I still can repo it even if I use `Synchronize` next to `Interval`. – noseratio Jun 02 '22 at 08:25
  • @TheodorZoulias, FYI, this is where the active Rx.NET community seems to be living these days: https://reactivex.slack.com/archives/C02B9R3QA. It's searchable, too! – noseratio Jun 02 '22 at 09:24
  • @noseratio [this site](https://reactivex.slack.com/archives/C02B9R3QA) has a *"You need to sign in to see this page. -- xxx doesn’t have an account on this workspace."* barrier. :-( (I hate those) – Theodor Zoulias Jun 02 '22 at 09:29
  • @TheodorZoulias, yep I had to create a free Slack account. Somehow I've never used Slack before. FWIW, Slack and Discord seem to be were the dev communities are moving, in demise of StackOverflow. – noseratio Jun 02 '22 at 09:40
  • 1
    BTW, `.TakeUntil(Task.Delay(Timeout.Infinite, token).ToObservable())` seems to be working fine, can't repo my issue with it. – noseratio Jun 02 '22 at 09:44
  • 1
    @noseratio I was about to say that it's probably because the `Task.Delay` [invokes its continuations asynchronously](https://github.com/dotnet/runtime/pull/64217 "Ensure that canceled Task.Delays invoke continuations asynchronously from Cancel"), but this PR was merged after the .NET 6.0 release. – Theodor Zoulias Jun 02 '22 at 10:00

2 Answers2

1

1yr later, I can't repro it. Are you still able to? I've tried it with .NET Framework 4.8 and .NET 6.

Yet, in my related question, I've just dealt with a race condition resulting in a deadlock, which might be related to the issue you've described.

CancellationTokenRegistration.Dispose is a blocking call by its nature, it waits for all callbacks to complete, and those callbacks are executed on the thread that called CancellationTokenSource.Cancel (e.g., a random timer callback thread), which is likely different from a thread that invokes Dispose of the Rx subscription.

I haven't dug deep, but the workaround in my case was to not invoke the token registration's Dispose if the token is already cancelled. This should be OK, as the token registrations are discarded after the cancellation is signaled, and we're not interested in the cancellation callback anymore when Rx calls Dispose:

    .TakeUntil(Observable.Create<Unit>(observer => 
    {
        var token = cts.Token;
        var rego = token.Register(() =>
        {
            Console.WriteLine("Before observer.OnNext");
            observer.OnNext(Unit.Default);
            Console.WriteLine("After observer.OnNext");
        });
        return Disposable.Create(() => 
        {
            if (!token.IsCancellationRequested)
            {
                // this 
                rego.Dispose();
            }
        });
    }))       

In .NET 6, one other possible workaround might be to use CancellationTokenRegistration.DisposeAsync (though I haven't verified if it would cure the deadlock in my case):

        return Disposable.Create(() => 
        {
            DisposeAsync();
            async void DisposeAsync() => 
              await rego.DisposeAsync().ConfigureAwait(false);
        });

Updated, as Theodor mentioned in the comments, using CancellationTokenRegistration.Unregister instead of Dispose is yet another workaround (non-blocking), which feels even cleaner to me.

noseratio
  • 59,932
  • 34
  • 208
  • 486
  • Yes, it is still reproducible. Try [this](https://dotnetfiddle.net/MIkqwU) fiddle (.NET 6). It happens on one every three runs on average. [Screenshot](https://prnt.sc/KX610WbCHAqS). – Theodor Zoulias Jun 04 '22 at 02:44
  • @TheodorZoulias, the fiddle link seems to be broken. – noseratio Jun 04 '22 at 02:47
  • @TheodorZoulias, indeed. So both of the workarounds seem to work, with either of them I cannot get the "Failed..." message. Otherwise, can. That must be where the race is. Could you confirm? – noseratio Jun 04 '22 at 02:55
  • 1
    The log entry `Failed: A task was canceled.` is always printed. The effect of the bug is that these two entries are missing in the logs: `After observer.OnNext` and `### Token Canceled! ###`. Your workaround seems to work. Another possible workaround that I found just now it to call [`Unregister`](https://learn.microsoft.com/en-us/dotnet/api/system.threading.cancellationtokenregistration.unregister) instead of `Dispose`. – Theodor Zoulias Jun 04 '22 at 03:04
  • The `async void` workaround is too frightening though. It looks like a crashing unhandled exception waiting to happen. :-) – Theodor Zoulias Jun 04 '22 at 03:07
  • The effect of the bug - you right, I got distracted and typed in rubbish :) `Unregister` feels even cleaner. – noseratio Jun 04 '22 at 03:11
  • But in general If there's one thing that I've learnt about Rx .NET, that'd be: *don't block inside `Dispose`* Even if it means offloading it to thread pool where possible, or re-scheduling with a custom scheduler. I'm not afraid of `async void` in this case. The clean-up code shouldn't be throwing exceptions, and if it does, it'd better crash my app (unhandled `async void` exceptions are queued for re-throw on thread pool or a sync context, if any). – noseratio Jun 04 '22 at 03:15
  • Honestly, regarding this issue I am more inclined to believe that the culprit is the `CancellationToken`, and not the Rx library! – Theodor Zoulias Jun 04 '22 at 03:21
  • 1
    @TheodorZoulias, IDK... while blocking inside CTR's `Dispose`, a racing cancellation callback might happen which would call `OnNext` which AFAIU would be a violation of Rx protocol. The whole thing is tricky. That's why I like RxJS where everything happens on the event loop :) We can have that in .NET too of course, but by default we don't :) – noseratio Jun 04 '22 at 03:29
1

After @noseratio posted his workaround, I found another one. I replaced the call CancellationTokenRegistration.Dispose with the call CancellationTokenRegistration.Unregister, and I can no longer reproduce the issue:

var cts = new CancellationTokenSource(500);
cts.Token.Register(() => Console.WriteLine("### Token Canceled! ###"));
try
{
    Observable
        .Timer(TimeSpan.Zero, TimeSpan.FromMilliseconds(1000))
        .TakeUntil(Observable.Create<Unit>(observer =>
            Disposable.Create(cts.Token.Register(() =>
            {
                Console.WriteLine("Before observer.OnNext");
                observer.OnNext(Unit.Default);
                Console.WriteLine("After observer.OnNext");
            }), state => state.Unregister())))
        .Select(_ =>
        {
            return Observable.StartAsync(async () =>
            {
                Console.WriteLine("Action starting");
                await Task.Delay(1000, cts.Token);
                return 1;
            });
        })
        .Switch()
        .Wait();
}
catch (Exception ex) { Console.WriteLine("Failed: {0}", ex.Message); }
Thread.Sleep(500);
Console.WriteLine("Finished");

Output (always 6 entries, as expected):

Action starting
Before observer.OnNext
After observer.OnNext
### Token Canceled! ###
Failed: A task was canceled.
Finished

Try it on Fiddle.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • 1
    I proposed it as `ToObservable` extension for `CancellationToken` in [Rx .NET repo](https://github.com/dotnet/reactive/issues/1773#:~:text=Here%20is%20an-,updated%20version,-aiming%20to%20solve). – noseratio Jun 04 '22 at 04:26