How to wrap SqlDataReader with IObservable properly?

Question

I would like to explore the possibility to use IObservable<T> as a wrapper around a SqlDataReader. Until now we were using the reader to avoid materializing the entire result in the memory and we did so using blocking synchronous API.

Now we want to try and use asynchronous API in conjunction with the .NET Reactive Extensions.

However, this code will have to coexist with a synchronous code as adopting the asynchronous ways is a gradual process.

We already know that this mix of synchronous and asynchronous would not work in ASP.NET - for that the entire request execution path must be asynchronous all throughout. An excellent article on the subject is http://blog.stephencleary.com/2012/07/dont-block-on-async-code.html

But I am talking about a plain WCF service. We already mix asynchronous and synchronous code there, however this is the first time we wish to introduce Rx and there are troubles.

I have created simple unit tests (we use mstest, sigh:-() to demonstrate the issues. My hope is that someone will be able to explain me what is going on. Please, find below the entire source code (using Moq):

using System;
using System.Data.Common;
using System.Diagnostics;
using System.Linq;
using System.Reactive.Linq;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Moq;

namespace UnitTests
{
    public static class Extensions
    {
        public static Task<List<T>> ToListAsync<T>(this IObservable<T> observable)
        {
            var res = new List<T>();
            var tcs = new TaskCompletionSource<List<T>>();
            observable.Subscribe(res.Add, e => tcs.TrySetException(e), () => tcs.TrySetResult(res));
            return tcs.Task;
        }
    }

    [TestClass]
    public class TestRx
    {
        public const int UNIT_TEST_TIMEOUT = 5000;

        private static DbDataReader CreateDataReader(int count = 100, int msWait = 10)
        {
            var curItemIndex = -1;

            var mockDataReader = new Mock<DbDataReader>();
            mockDataReader.Setup(o => o.ReadAsync(It.IsAny<CancellationToken>())).Returns<CancellationToken>(ct => Task.Factory.StartNew(() =>
            {
                Thread.Sleep(msWait);
                if (curItemIndex + 1 < count && !ct.IsCancellationRequested)
                {
                    ++curItemIndex;
                    return true;
                }
                Trace.WriteLine(curItemIndex);
                return false;
            }));
            mockDataReader.Setup(o => o[0]).Returns<int>(_ => curItemIndex);
            mockDataReader.CallBase = true;
            mockDataReader.Setup(o => o.Close()).Verifiable();
            return mockDataReader.Object;
        }

        private static IObservable<int> GetObservable(DbDataReader reader)
        {
            return Observable.Create<int>(async (obs, cancellationToken) =>
            {
                using (reader)
                {
                    while (!cancellationToken.IsCancellationRequested && await reader.ReadAsync(cancellationToken))
                    {
                        obs.OnNext((int)reader[0]);
                    }
                }
            });
        }

        [TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
        public void ToListAsyncResult()
        {
            var reader = CreateDataReader();
            var numbers = GetObservable(reader).ToListAsync().Result;
            CollectionAssert.AreEqual(Enumerable.Range(0, 100).ToList(), numbers);
            Mock.Get(reader).Verify(o => o.Close());
        }

        [TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
        public void ToEnumerableToList()
        {
            var reader = CreateDataReader();
            var numbers = GetObservable(reader).ToEnumerable().ToList();
            CollectionAssert.AreEqual(Enumerable.Range(0, 100).ToList(), numbers);
            Mock.Get(reader).Verify(o => o.Close());
        }

        [TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
        public void ToEnumerableForEach()
        {
            var reader = CreateDataReader();
            int i = 0;
            foreach (var n in GetObservable(reader).ToEnumerable())
            {
                Assert.AreEqual(i, n);
                ++i;
            }
            Assert.AreEqual(100, i);
            Mock.Get(reader).Verify(o => o.Close());
        }

        [TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
        public void ToEnumerableForEachBreak()
        {
            var reader = CreateDataReader();
            int i = 0;
            foreach (var n in GetObservable(reader).ToEnumerable())
            {
                Assert.AreEqual(i, n);
                ++i;
                if (i == 5)
                {
                    break;
                }
            }
            Mock.Get(reader).Verify(o => o.Close());
        }

        [TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
        public void ToEnumerableForEachThrow()
        {
            var reader = CreateDataReader();
            int i = 0;
            try
            {
                foreach (var n in GetObservable(reader).ToEnumerable())
                {
                    Assert.AreEqual(i, n);
                    ++i;
                    if (i == 5)
                    {
                        throw new Exception("xo-xo");
                    }
                }
                Assert.Fail();
            }
            catch (Exception exc)
            {
                Assert.AreEqual("xo-xo", exc.Message);
                Mock.Get(reader).Verify(o => o.Close());
            } 
        }

        [TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
        public void Subscribe()
        {
            var reader = CreateDataReader();
            var tcs = new TaskCompletionSource<object>();
            int i = 0;
            GetObservable(reader).Subscribe(n =>
            {
                Assert.AreEqual(i, n);
                ++i;
            }, () =>
            {
                Assert.AreEqual(100, i);
                Mock.Get(reader).Verify(o => o.Close());
                tcs.TrySetResult(null);
            });

            tcs.Task.Wait();
        }

        [TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
        public void SubscribeCancel()
        {
            var reader = CreateDataReader();
            var tcs = new TaskCompletionSource<object>();
            var cts = new CancellationTokenSource();
            int i = 0;
            GetObservable(reader).Subscribe(n =>
            {
                Assert.AreEqual(i, n);
                ++i;
                if (i == 5)
                {
                    cts.Cancel();
                }
            }, e =>
            {
                Assert.IsTrue(i < 100);
                Mock.Get(reader).Verify(o => o.Close());
                tcs.TrySetException(e);
            }, () =>
            {
                Assert.IsTrue(i < 100);
                Mock.Get(reader).Verify(o => o.Close());
                tcs.TrySetResult(null);
            }, cts.Token);

            tcs.Task.Wait();
        }

        [TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
        public void SubscribeThrow()
        {
            var reader = CreateDataReader();
            var tcs = new TaskCompletionSource<object>();
            int i = 0;
            GetObservable(reader).Subscribe(n =>
            {
                Assert.AreEqual(i, n);
                ++i;
                if (i == 5)
                {
                    throw new Exception("xo-xo");
                }
            }, e =>
            {
                Assert.AreEqual("xo-xo", e.Message);
                Mock.Get(reader).Verify(o => o.Close());
                tcs.TrySetResult(null);
            });

            tcs.Task.Wait();
        }
    }
}

These unit tests capture all the possible uses of an API returning an IObservable<T> wrapping a data reader:

People might want to materialize it completely using either our ToListAsync extension method or .ToEnumerable().ToList().
People might want to iterate over it using the ToEnumerable extension method. True - it blocks if the consumption is fast and it materializes the data in an internal queue if the consumption is slow, but this scenario is legitimate nonetheless.
Finally people might use the observable directly by subscribing to it, but they would have to wait for the end (blocking the thread) at some point, since most of the code around is still synchronous.

An essential requirement is that the data reader be promptly disposed of once the reading is over - regardless of the way the observable is consumed.

Of all the unit tests 4 fail:

SubscribeCancel and SubscribeThrow time out (i.e. deadlock)
ToEnumerableForEachBreak and ToEnumerableForEachThrow fail the validation of the data reader disposal.

The data reader disposal validation failure is a matter of timing - when foreach is left (either through exception or break) the respective IEnumerator is immediately disposed, which ultimately cancels the cancellation token used by the implementation of the observable. However, that implementation runs on another thread and by the time it notices the cancellation - the unit test is already over. In a real application the reader would be properly and rather promptly disposed of, but it is not synchronized with the end of the iteration. I am wondering whether it is possible to make the disposal of the aforementioned IEnumerator instance to wait until the cancellation is noticed by the respective IObservable implementation and the reader is disposed of.

Edit

So DbDataReader is IEnumerable, meaning if one wishes to enumerate the objects synchronously - no problem.

However, what if I want to do it asynchronously? I am banned to enumerate the reader in this case - it is a blocking operation. The only way out is to return an observable. Others discussed this subject already and in better language than I would ever do, for example - http://www.interact-sw.co.uk/iangblog/2013/11/29/async-yield-return

Hence I have to return an IObservable and I cannot use the ToObservable extension method, because it depends on the blocking enumeration of the reader.

Next, given an IObservable someone might convert it to an IEnumerable, which is stupid, given the fact that the reader is already an IEnumerable, but feasible and legitimate nonetheless.

Edit 2

Debugging the code with .NET Reflector (integrated with VS) reveals that the flow passes through the following method:

namespace System.Reactive.Threading.Tasks
{
  public static class TaskObservableExtensions
  {
    ...
    private static void ToObservableDone<TResult>(Task<TResult> task, AsyncSubject<TResult> subject)
    {
      switch (task.get_Status())
      {
      case TaskStatus.RanToCompletion:
        subject.OnNext(task.get_Result());
        subject.OnCompleted();
        return;

      case TaskStatus.Canceled:
        subject.OnError((Exception) new TaskCanceledException((Task) task));
        return;

      case TaskStatus.Faulted:
        subject.OnError(task.get_Exception().get_InnerException());
        return;
      }
    }
  }
}

Both cancellation of the token and throwing from the OnNext in an asynchronous subscription lands into this method (as well as successful completion). Both cancellation and throwing converge to the subject.OnError method. That method is supposed to ultimately delegate to the OnError handler. But it does not.

Edit 3

Following Why is the OnError callback never called when throwing from the given subscriber? I now wonder what should be the right approach to satisfy the following goals:

Expose objects available through reading an SqlDataReader instance asynchronously
Avoid materialization of the objects. The choice to materialize should be in the hands of the caller of the API.
The API should be usable in an environment where asynchronous code is mixed with synchronous. Why? Because we already have a server using synchronous IO and we want gradually phase out synchronous blocking IO with asynchronous one.

Having these goals in front of me I have come up with something like this (see the unit test code):

private static IObservable<int> GetObservable(DbDataReader reader)
{
    return Observable.Create<int>(async (obs, cancellationToken) =>
    {
        using (reader)
        {
            while (!cancellationToken.IsCancellationRequested && await reader.ReadAsync(cancellationToken))
            {
                obs.OnNext((int)reader[0]);
            }
        }
    });
}

Does it make sense to you? If not, what are the alternatives?

Next, I was thinking to use it as demonstrated by the Subscribe unit test code. However, the results of SubcribeCancel and SubscribeThrow show that this usage pattern is wrong. Why is the OnError callback never called when throwing from the given subscriber? explains why is it wrong.

So, what is the right way? How to prevent the consumers of the API from consuming it incorrectly (SubcribeCancel and SubscribeThrow are examples of such incorrect consumption).

as a side note, your `ToListAsync` function is redundant, as Rx includes the following operations: `source.ToList().ToTask();`. — cwharris, May 24 '14 at 17:19
You might be over-engineering your solution... http://blogs.msdn.com/b/rickandy/archive/2009/11/14/should-my-database-calls-be-asynchronous.aspx — cwharris, May 25 '14 at 01:23
@ChristopherHarris interesting read. However, we have reasons to believe that we are going to benefit from asynchronous IO in our particular application. We do have multiple DB calls in parallel, after all. — mark, May 25 '14 at 01:49
It seems the Interactive Extensions have AsyncEnumerable. You might take a look at that. https://rx.codeplex.com/SourceControl/latest#Ix.NET/Source/System.Interactive.Async/IAsyncEnumerable.cs — cwharris, May 26 '14 at 21:01
I will, but it does not explain the behavior when subscribing directly without ToEnumerable. — mark, May 27 '14 at 00:41

cwharris · Answer 1 · 2014-05-27T16:20:23.423

SubscribeCancel

SubscribeCancel is failing due to the cts cancel. This doesn't invoke the OnError handler.

Cancelling your cts is synonymous with disposing of your subscription. Disposing of a subscription causes all future OnNext, OnError, and OnCompleted calls to be ignored. Therefore, the task is never completed, and the test hangs forever.

Solution:

When you cancel the cts, set the Task to its proper state.

SubscribeThrow

SubscribeThrow is failing due to an exception in the OnNext handler.

Throwing an exception in an OnNext handler does not forward the exception to the OnError handler.

Solution:

Don't throw an exception in your Subscribe handlers. Instead, dispose of your subscription and set the Task to its proper state.

ToEnumerableForEachThrow & ToEnumerableForEachBreak

ToEnumerableForEachThrow and ToEnumerableForEachBreak are failing due to a race condition.

foreach(...) on the enumerable will call dispose on the underlying observable, which will cancel the cancellation token. After that, the exception is caught by your test's catch (or the break simply exits the foreach), where you test to see that the underlying reader is disposed... except the reader hasn't been disposed yet, because the observable is still waiting on the reader to yield the next result... only after the reader yields (and the observable yields), will the observable loop back around and check the cancellation token. At that point the observable breaks and exits out of the using block and disposes of the reader.

Solution:

Instead of your using (...) statement, return a Disposable from your Observable.Create. The disposable will be disposed when the subscription is disposed. This is what you want. Get rid of the using statement all together, and let Rx do it's job.

I disagree about the `SubscribeCancel` statement. The token may be cancelled deep down in the code. The requirement to propagate the respective Task instance to all the places where the token might be cancelled is insane and is unsupported by examining the Rx code with .NET Reflector - I am going to add an edit to my post right away. — mark, May 27 '14 at 19:03
Exactly what part do you disagree with? The statement of fact, or my recommended solution? — cwharris, May 27 '14 at 19:08
Please, see my edit in the post where I claim that the Rx code treats cancellation of the token and throwing from `OnNext` in exactly the same way and that both are expected to result in `OnError`, but do not for some reason. — mark, May 27 '14 at 19:10
Show me a link to the documentation that defines this as designed behavior. — cwharris, May 27 '14 at 19:19
Let's continue this via chat... http://chat.stackoverflow.com/rooms/54553/rx-behavior — cwharris, May 27 '14 at 19:25

How to wrap SqlDataReader with IObservable properly?

Edit

Edit 2

Edit 3

1 Answers1

SubscribeCancel

SubscribeThrow

ToEnumerableForEachThrow & ToEnumerableForEachBreak

Linked