In stating the requirement for detecting lost messages, you haven't considered the possibility of the last message not arriving; I've added a timeoutDuration
which flushes the buffered messages if nothing arrives in the given time - you may want to consider this an error instead, see the comments for how to do this.
I will solve this by defining an extension method with the following signature:
public static IObservable<TSource> Sort<TSource>(
this IObservable<TSource> source,
Func<TSource, int> keySelector,
TimeSpan timeoutDuration = new TimeSpan(),
int gapTolerance = 0)
source
is the stream of unsorted messages
keySelector
is a function that extracts an int
key from a message. I assume the first key sought is 0; amend if necessary.
timeoutDuration
is discussed above, if omitted, there is no timeout
tolerance
is the maximum number of messages held back while waiting for an out of order message. Pass 0
to hold any number of messages
scheduler
is the scheduler to use for the timeout and is supplied for test purposes, a default is used if not given.
Walkthrough
I'll present a line-by-line walkthrough here. The full implementation is repeated below.
Assign Default Scheduler
First of all we must assign a default scheduler if none was supplied:
scheduler = scheduler ?? Scheduler.Default;
Arrange Timeout
Now if a time out was requested, we will replace the source with a copy that will simply terminate and send OnCompleted
if a message doesn't arrive in timeoutDuration
.
if(timeoutDuration != TimeSpan.Zero)
source = source.Timeout(
timeoutDuration,
Observable.Empty<TSource>(),
scheduler);
If you wish to send a TimeoutException
instead, just delete the second parameter to Timeout
- the empty stream, to select an overload that does this. Note we can safely share this with all subscribers, so it is positioned outside the call to Observable.Create
.
Create Subscribe handler
We use Observable.Create
to build our stream. The lambda function that is the argument to Create
is invoked whenever a subscription occurs and we are passed the calling observer (o
). Create
returns our IObservable<T>
so we return it here.
return Observable.Create<TSource>(o => { ...
Initialize some variables
We will track the next expected key value in nextKey
, and create a SortedDictionary
to hold the out of order messages until they can be sent.
int nextKey = 0;
var buffer = new SortedDictionary<int, TSource>();
Subscribe to the source, and handle messages
Now we can subscribe to the message stream (possibly with the timeout applied). First we introduce the OnNext
handler. The next message is assigned to x
:
return source.Subscribe(x => { ...
We invoke the keySelector
function to extract the key from the message:
var key = keySelector(x);
If the message has an old key (because it exceeded our tolerance for out of order messages) we are just going to drop it and be done with this message (you may want to act differently):
// drop stale keys
if(key < nextKey) return;
Otherwise, we might have the expected key, in which case we can increment nextKey
send the message:
if(key == nextKey)
{
nextKey++;
o.OnNext(x);
}
Or, we might have an out of order future message, in which case we must add it to our buffer. If we do this, we must also ensure our buffer hasn't exceeded our tolerance for storing out of order messages - in this case, we will also bump nextKey
to the first key in the buffer which because it is a SortedDictionary
is conveniently the next lowest key:
else if(key > nextKey)
{
buffer.Add(key, x);
if(gapTolerance != 0 && buffer.Count > gapTolerance)
nextKey = buffer.First().Key;
}
Now regardless of the outcome above, we need to empty the buffer of any keys that are now ready to go. We use a helper method for this. Note that it adjusts nextKey
so we must be careful to pass it by reference. We simply loop over the buffer reading, removing and sending messages as long as the keys follow on from each other, incrementing nextKey
each time:
private static void SendNextConsecutiveKeys<TSource>(
ref int nextKey,
IObserver<TSource> observer,
SortedDictionary<int, TSource> buffer)
{
TSource x;
while(buffer.TryGetValue(nextKey, out x))
{
buffer.Remove(nextKey);
nextKey++;
observer.OnNext(x);
}
}
Dealing with errors
Next we supply an OnError
handler - this will just pass through any error, including the Timeout exception if you chose to go that way.
Flushing the buffer
Finally, we must handle OnCompleted
. Here I have opted to empty the buffer - this would be necessary if an out of order message held up messages and never arrived. This is why we need a timeout:
() => {
// empty buffer on completion
foreach(var item in buffer)
o.OnNext(item.Value);
o.OnCompleted();
});
Full Implementation
Here is the full implementation.
public static IObservable<TSource> Sort<TSource>(
this IObservable<TSource> source,
Func<TSource, int> keySelector,
int gapTolerance = 0,
TimeSpan timeoutDuration = new TimeSpan(),
IScheduler scheduler = null)
{
scheduler = scheduler ?? Scheduler.Default;
if(timeoutDuration != TimeSpan.Zero)
source = source.Timeout(
timeoutDuration,
Observable.Empty<TSource>(),
scheduler);
return Observable.Create<TSource>(o => {
int nextKey = 0;
var buffer = new SortedDictionary<int, TSource>();
return source.Subscribe(x => {
var key = keySelector(x);
// drop stale keys
if(key < nextKey) return;
if(key == nextKey)
{
nextKey++;
o.OnNext(x);
}
else if(key > nextKey)
{
buffer.Add(key, x);
if(gapTolerance != 0 && buffer.Count > gapTolerance)
nextKey = buffer.First().Key;
}
SendNextConsecutiveKeys(ref nextKey, o, buffer);
},
o.OnError,
() => {
// empty buffer on completion
foreach(var item in buffer)
o.OnNext(item.Value);
o.OnCompleted();
});
});
}
private static void SendNextConsecutiveKeys<TSource>(
ref int nextKey,
IObserver<TSource> observer,
SortedDictionary<int, TSource> buffer)
{
TSource x;
while(buffer.TryGetValue(nextKey, out x))
{
buffer.Remove(nextKey);
nextKey++;
observer.OnNext(x);
}
}
Test Harness
If you include nuget rx-testing
in a console app, the following will run given you a test harness to play with:
public static void Main()
{
var tests = new Tests();
tests.Test();
}
public class Tests : ReactiveTest
{
public void Test()
{
var scheduler = new TestScheduler();
var xs = scheduler.CreateColdObservable(
OnNext(100, 0),
OnNext(200, 2),
OnNext(300, 1),
OnNext(400, 4),
OnNext(500, 5),
OnNext(600, 3),
OnNext(700, 7),
OnNext(800, 8),
OnNext(900, 9),
OnNext(1000, 6),
OnNext(1100, 12),
OnCompleted(1200, 0));
//var results = scheduler.CreateObserver<int>();
xs.Sort(
keySelector: x => x,
gapTolerance: 2,
timeoutDuration: TimeSpan.FromTicks(200),
scheduler: scheduler).Subscribe(Console.WriteLine);
scheduler.Start();
}
}
Closing comments
There's all sorts of interesting alternative approaches here. I went for this largely imperative approach because I think it's easiest to follow - but there's probably some fancy grouping shenanigans you can employ to do this to. One thing I know to be consistently true about Rx - there's always many ways to skin a cat!
I'm also not entirely comfortable with the timeout idea here - in a production system, I would want to implement some means of checking connectivity, such as a heartbeat or similar. I didn't get into this because obviously it will be application specific. Also, heartbeats have been discussed on these boards and elsewhere before (such as on my blog for example).