I have a stream of data records being loaded from database. I can't store and load all of them into memory because there are millions of them. The caller should process records one by one (of course I have no guarantee).
My first try was to return lazy sequence of IEnumerable<Records>
which would be loaded on demand and returned by yield return
statement.
But I couldn't use await/async
(which was used to get data from a database) in this method because yield return
requires a return type of IEnumerable<>
.
In a result I cannot use async
and Task<IEnumerable<>>
.
Reading this convinced me to try Reactive Extensions as I can await async methods and return IObservable<>
.
But as far as I've figured out as soon as someone subscribes to my observable the method which pulls the data is being invoked and it will pull all the data at once.
This is how part of my method my method looks like:
IList<int> ids = (...);
return Observable.Create<NitemonkeyRegistration>(async obs =>
{
using (SqlDataReader reader = await command.ExecuteReaderAsync())
{
if (!reader.HasRows)
obs.OnCompleted();
while (await reader.ReadAsync())
ids.Add(reader.GetInt32(reader.GetOrdinal("RegistrationId")));
for (int i = 0; i < ids.Count; i += 1000)
{
//heavy database operations
var registrations = await GetRegistrationsByIds(connection, ids.Skip(i).Take(1000));
foreach (var pulledReg in registrations)
{
obs.OnNext(pulledReg);
}
}
}
});
Can I put the caller in control so when he calls .Next()
on the observable then my code pulls the data on demand?
How can I implement something that is similar to yield return
using reactive extensions?
UPDATE
This is my consumer code:
var cancellationTokenSource = new CancellationTokenSource();
await Observable.ForEachAsync<NitemonkeyRegistration>(niteMonkeySales, async (record, i) =>
{
try
{
await SomethingAwaitableWhichCanTakeSeconds(record);
}
catch(Exception e)
{
// add logging
// this cancels the loop but also the IObservable
cancellationTokenSource.Cancel();
// can't rethrow because line
// above will cause errored http response already created
}
}, cancellationTokenSource.Token);
Problem with this is that new records are being pushed not waiting for awaitable task to complete. I can do this with .Wait() and not async lambda but the thread will be wasted waiting for a lengthy network operation to complete.
Might be important: this is a ASP.NET WEB API service.