I would like to explore the possibility to use IObservable<T>
as a wrapper around a SqlDataReader
. Until now we were using the reader to avoid materializing the entire result in the memory and we did so using blocking synchronous API.
Now we want to try and use asynchronous API in conjunction with the .NET Reactive Extensions.
However, this code will have to coexist with a synchronous code as adopting the asynchronous ways is a gradual process.
We already know that this mix of synchronous and asynchronous would not work in ASP.NET - for that the entire request execution path must be asynchronous all throughout. An excellent article on the subject is http://blog.stephencleary.com/2012/07/dont-block-on-async-code.html
But I am talking about a plain WCF service. We already mix asynchronous and synchronous code there, however this is the first time we wish to introduce Rx and there are troubles.
I have created simple unit tests (we use mstest, sigh:-() to demonstrate the issues. My hope is that someone will be able to explain me what is going on. Please, find below the entire source code (using Moq):
using System;
using System.Data.Common;
using System.Diagnostics;
using System.Linq;
using System.Reactive.Linq;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Moq;
namespace UnitTests
{
public static class Extensions
{
public static Task<List<T>> ToListAsync<T>(this IObservable<T> observable)
{
var res = new List<T>();
var tcs = new TaskCompletionSource<List<T>>();
observable.Subscribe(res.Add, e => tcs.TrySetException(e), () => tcs.TrySetResult(res));
return tcs.Task;
}
}
[TestClass]
public class TestRx
{
public const int UNIT_TEST_TIMEOUT = 5000;
private static DbDataReader CreateDataReader(int count = 100, int msWait = 10)
{
var curItemIndex = -1;
var mockDataReader = new Mock<DbDataReader>();
mockDataReader.Setup(o => o.ReadAsync(It.IsAny<CancellationToken>())).Returns<CancellationToken>(ct => Task.Factory.StartNew(() =>
{
Thread.Sleep(msWait);
if (curItemIndex + 1 < count && !ct.IsCancellationRequested)
{
++curItemIndex;
return true;
}
Trace.WriteLine(curItemIndex);
return false;
}));
mockDataReader.Setup(o => o[0]).Returns<int>(_ => curItemIndex);
mockDataReader.CallBase = true;
mockDataReader.Setup(o => o.Close()).Verifiable();
return mockDataReader.Object;
}
private static IObservable<int> GetObservable(DbDataReader reader)
{
return Observable.Create<int>(async (obs, cancellationToken) =>
{
using (reader)
{
while (!cancellationToken.IsCancellationRequested && await reader.ReadAsync(cancellationToken))
{
obs.OnNext((int)reader[0]);
}
}
});
}
[TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
public void ToListAsyncResult()
{
var reader = CreateDataReader();
var numbers = GetObservable(reader).ToListAsync().Result;
CollectionAssert.AreEqual(Enumerable.Range(0, 100).ToList(), numbers);
Mock.Get(reader).Verify(o => o.Close());
}
[TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
public void ToEnumerableToList()
{
var reader = CreateDataReader();
var numbers = GetObservable(reader).ToEnumerable().ToList();
CollectionAssert.AreEqual(Enumerable.Range(0, 100).ToList(), numbers);
Mock.Get(reader).Verify(o => o.Close());
}
[TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
public void ToEnumerableForEach()
{
var reader = CreateDataReader();
int i = 0;
foreach (var n in GetObservable(reader).ToEnumerable())
{
Assert.AreEqual(i, n);
++i;
}
Assert.AreEqual(100, i);
Mock.Get(reader).Verify(o => o.Close());
}
[TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
public void ToEnumerableForEachBreak()
{
var reader = CreateDataReader();
int i = 0;
foreach (var n in GetObservable(reader).ToEnumerable())
{
Assert.AreEqual(i, n);
++i;
if (i == 5)
{
break;
}
}
Mock.Get(reader).Verify(o => o.Close());
}
[TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
public void ToEnumerableForEachThrow()
{
var reader = CreateDataReader();
int i = 0;
try
{
foreach (var n in GetObservable(reader).ToEnumerable())
{
Assert.AreEqual(i, n);
++i;
if (i == 5)
{
throw new Exception("xo-xo");
}
}
Assert.Fail();
}
catch (Exception exc)
{
Assert.AreEqual("xo-xo", exc.Message);
Mock.Get(reader).Verify(o => o.Close());
}
}
[TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
public void Subscribe()
{
var reader = CreateDataReader();
var tcs = new TaskCompletionSource<object>();
int i = 0;
GetObservable(reader).Subscribe(n =>
{
Assert.AreEqual(i, n);
++i;
}, () =>
{
Assert.AreEqual(100, i);
Mock.Get(reader).Verify(o => o.Close());
tcs.TrySetResult(null);
});
tcs.Task.Wait();
}
[TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
public void SubscribeCancel()
{
var reader = CreateDataReader();
var tcs = new TaskCompletionSource<object>();
var cts = new CancellationTokenSource();
int i = 0;
GetObservable(reader).Subscribe(n =>
{
Assert.AreEqual(i, n);
++i;
if (i == 5)
{
cts.Cancel();
}
}, e =>
{
Assert.IsTrue(i < 100);
Mock.Get(reader).Verify(o => o.Close());
tcs.TrySetException(e);
}, () =>
{
Assert.IsTrue(i < 100);
Mock.Get(reader).Verify(o => o.Close());
tcs.TrySetResult(null);
}, cts.Token);
tcs.Task.Wait();
}
[TestMethod, TestCategory("CI"), Timeout(UNIT_TEST_TIMEOUT)]
public void SubscribeThrow()
{
var reader = CreateDataReader();
var tcs = new TaskCompletionSource<object>();
int i = 0;
GetObservable(reader).Subscribe(n =>
{
Assert.AreEqual(i, n);
++i;
if (i == 5)
{
throw new Exception("xo-xo");
}
}, e =>
{
Assert.AreEqual("xo-xo", e.Message);
Mock.Get(reader).Verify(o => o.Close());
tcs.TrySetResult(null);
});
tcs.Task.Wait();
}
}
}
These unit tests capture all the possible uses of an API returning an IObservable<T>
wrapping a data reader:
- People might want to materialize it completely using either our
ToListAsync
extension method or.ToEnumerable().ToList()
. - People might want to iterate over it using the
ToEnumerable
extension method. True - it blocks if the consumption is fast and it materializes the data in an internal queue if the consumption is slow, but this scenario is legitimate nonetheless. - Finally people might use the observable directly by subscribing to it, but they would have to wait for the end (blocking the thread) at some point, since most of the code around is still synchronous.
An essential requirement is that the data reader be promptly disposed of once the reading is over - regardless of the way the observable is consumed.
Of all the unit tests 4 fail:
SubscribeCancel
andSubscribeThrow
time out (i.e. deadlock)ToEnumerableForEachBreak
andToEnumerableForEachThrow
fail the validation of the data reader disposal.
The data reader disposal validation failure is a matter of timing - when foreach
is left (either through exception or break) the respective IEnumerator
is immediately disposed, which ultimately cancels the cancellation token used by the implementation of the observable. However, that implementation runs on another thread and by the time it notices the cancellation - the unit test is already over. In a real application the reader would be properly and rather promptly disposed of, but it is not synchronized with the end of the iteration. I am wondering whether it is possible to make the disposal of the aforementioned IEnumerator
instance to wait until the cancellation is noticed by the respective IObservable
implementation and the reader is disposed of.
Edit
So DbDataReader
is IEnumerable
, meaning if one wishes to enumerate the objects synchronously - no problem.
However, what if I want to do it asynchronously? I am banned to enumerate the reader in this case - it is a blocking operation. The only way out is to return an observable. Others discussed this subject already and in better language than I would ever do, for example - http://www.interact-sw.co.uk/iangblog/2013/11/29/async-yield-return
Hence I have to return an IObservable
and I cannot use the ToObservable
extension method, because it depends on the blocking enumeration of the reader.
Next, given an IObservable
someone might convert it to an IEnumerable
, which is stupid, given the fact that the reader is already an IEnumerable
, but feasible and legitimate nonetheless.
Edit 2
Debugging the code with .NET Reflector (integrated with VS) reveals that the flow passes through the following method:
namespace System.Reactive.Threading.Tasks
{
public static class TaskObservableExtensions
{
...
private static void ToObservableDone<TResult>(Task<TResult> task, AsyncSubject<TResult> subject)
{
switch (task.get_Status())
{
case TaskStatus.RanToCompletion:
subject.OnNext(task.get_Result());
subject.OnCompleted();
return;
case TaskStatus.Canceled:
subject.OnError((Exception) new TaskCanceledException((Task) task));
return;
case TaskStatus.Faulted:
subject.OnError(task.get_Exception().get_InnerException());
return;
}
}
}
}
Both cancellation of the token and throwing from the OnNext
in an asynchronous subscription lands into this method (as well as successful completion). Both cancellation and throwing converge to the subject.OnError
method. That method is supposed to ultimately delegate to the OnError
handler. But it does not.
Edit 3
Following Why is the OnError callback never called when throwing from the given subscriber? I now wonder what should be the right approach to satisfy the following goals:
- Expose objects available through reading an
SqlDataReader
instance asynchronously - Avoid materialization of the objects. The choice to materialize should be in the hands of the caller of the API.
- The API should be usable in an environment where asynchronous code is mixed with synchronous. Why? Because we already have a server using synchronous IO and we want gradually phase out synchronous blocking IO with asynchronous one.
Having these goals in front of me I have come up with something like this (see the unit test code):
private static IObservable<int> GetObservable(DbDataReader reader)
{
return Observable.Create<int>(async (obs, cancellationToken) =>
{
using (reader)
{
while (!cancellationToken.IsCancellationRequested && await reader.ReadAsync(cancellationToken))
{
obs.OnNext((int)reader[0]);
}
}
});
}
Does it make sense to you? If not, what are the alternatives?
Next, I was thinking to use it as demonstrated by the Subscribe
unit test code. However, the results of SubcribeCancel
and SubscribeThrow
show that this usage pattern is wrong. Why is the OnError callback never called when throwing from the given subscriber? explains why is it wrong.
So, what is the right way? How to prevent the consumers of the API from consuming it incorrectly (SubcribeCancel
and SubscribeThrow
are examples of such incorrect consumption).