order reactive extension events

Question

I am receiving messages on UDP in multiple threads. After each reception I raise MessageReceived.OnNext(message).

Because I am using multiple threads the messages raised unordered which is a problem.

How can I order the raise of the messages by the message counter? (lets say there is a message.counter property)

Must take in mind a message can get lost in the communication (lets say if we have a counter hole after X messages that the hole is not filled I raise the next message)

Messages must be raised ASAP (if the next counter received)

score 6 · Accepted Answer · answered Oct 19 '14 at 16:57

In stating the requirement for detecting lost messages, you haven't considered the possibility of the last message not arriving; I've added a timeoutDuration which flushes the buffered messages if nothing arrives in the given time - you may want to consider this an error instead, see the comments for how to do this.

I will solve this by defining an extension method with the following signature:

public static IObservable<TSource> Sort<TSource>(
    this IObservable<TSource> source,
    Func<TSource, int> keySelector,
    TimeSpan timeoutDuration = new TimeSpan(),
    int gapTolerance = 0)

source is the stream of unsorted messages
keySelector is a function that extracts an int key from a message. I assume the first key sought is 0; amend if necessary.
timeoutDuration is discussed above, if omitted, there is no timeout
tolerance is the maximum number of messages held back while waiting for an out of order message. Pass 0 to hold any number of messages
scheduler is the scheduler to use for the timeout and is supplied for test purposes, a default is used if not given.

Walkthrough

I'll present a line-by-line walkthrough here. The full implementation is repeated below.

Assign Default Scheduler

First of all we must assign a default scheduler if none was supplied:

scheduler = scheduler ?? Scheduler.Default;

Arrange Timeout

Now if a time out was requested, we will replace the source with a copy that will simply terminate and send OnCompleted if a message doesn't arrive in timeoutDuration.

if(timeoutDuration != TimeSpan.Zero)
    source = source.Timeout(
        timeoutDuration,
        Observable.Empty<TSource>(),
        scheduler);

If you wish to send a TimeoutException instead, just delete the second parameter to Timeout - the empty stream, to select an overload that does this. Note we can safely share this with all subscribers, so it is positioned outside the call to Observable.Create.

Create Subscribe handler

We use Observable.Create to build our stream. The lambda function that is the argument to Create is invoked whenever a subscription occurs and we are passed the calling observer (o). Create returns our IObservable<T> so we return it here.

return Observable.Create<TSource>(o => { ...

Initialize some variables

We will track the next expected key value in nextKey, and create a SortedDictionary to hold the out of order messages until they can be sent.

int nextKey = 0;  
var buffer = new SortedDictionary<int, TSource>();

Subscribe to the source, and handle messages

Now we can subscribe to the message stream (possibly with the timeout applied). First we introduce the OnNext handler. The next message is assigned to x:

return source.Subscribe(x => { ...

We invoke the keySelector function to extract the key from the message:

var key = keySelector(x);

If the message has an old key (because it exceeded our tolerance for out of order messages) we are just going to drop it and be done with this message (you may want to act differently):

// drop stale keys
if(key < nextKey) return;

Otherwise, we might have the expected key, in which case we can increment nextKey send the message:

if(key == nextKey)
{
    nextKey++;
    o.OnNext(x);                    
}

Or, we might have an out of order future message, in which case we must add it to our buffer. If we do this, we must also ensure our buffer hasn't exceeded our tolerance for storing out of order messages - in this case, we will also bump nextKey to the first key in the buffer which because it is a SortedDictionary is conveniently the next lowest key:

else if(key > nextKey)
{
    buffer.Add(key, x);
    if(gapTolerance != 0 && buffer.Count > gapTolerance)
        nextKey = buffer.First().Key;
}

Now regardless of the outcome above, we need to empty the buffer of any keys that are now ready to go. We use a helper method for this. Note that it adjusts nextKey so we must be careful to pass it by reference. We simply loop over the buffer reading, removing and sending messages as long as the keys follow on from each other, incrementing nextKey each time:

private static void SendNextConsecutiveKeys<TSource>(
    ref int nextKey,
    IObserver<TSource> observer,
    SortedDictionary<int, TSource> buffer)
{
    TSource x;
    while(buffer.TryGetValue(nextKey, out x))
    {
        buffer.Remove(nextKey);
        nextKey++;
        observer.OnNext(x);                        
    }
}

Dealing with errors

Next we supply an OnError handler - this will just pass through any error, including the Timeout exception if you chose to go that way.

Flushing the buffer

Finally, we must handle OnCompleted. Here I have opted to empty the buffer - this would be necessary if an out of order message held up messages and never arrived. This is why we need a timeout:

() => {
    // empty buffer on completion
    foreach(var item in buffer)
        o.OnNext(item.Value);                
    o.OnCompleted();
});

Full Implementation

Here is the full implementation.

public static IObservable<TSource> Sort<TSource>(
    this IObservable<TSource> source,
    Func<TSource, int> keySelector,
    int gapTolerance = 0,
    TimeSpan timeoutDuration = new TimeSpan(),
    IScheduler scheduler = null)
{       
    scheduler = scheduler ?? Scheduler.Default;

    if(timeoutDuration != TimeSpan.Zero)
        source = source.Timeout(
            timeoutDuration,
            Observable.Empty<TSource>(),
            scheduler);

    return Observable.Create<TSource>(o => {
        int nextKey = 0;  
        var buffer = new SortedDictionary<int, TSource>();

        return source.Subscribe(x => {
            var key = keySelector(x);

            // drop stale keys
            if(key < nextKey) return;

            if(key == nextKey)
            {
                nextKey++;
                o.OnNext(x);                    
            }
            else if(key > nextKey)
            {
                buffer.Add(key, x);
                if(gapTolerance != 0 && buffer.Count > gapTolerance)
                    nextKey = buffer.First().Key;
            }
            SendNextConsecutiveKeys(ref nextKey, o, buffer);
        },
        o.OnError,
        () => {
            // empty buffer on completion
            foreach(var item in buffer)
                o.OnNext(item.Value);                
            o.OnCompleted();
        });
    });
}

private static void SendNextConsecutiveKeys<TSource>(
    ref int nextKey,
    IObserver<TSource> observer,
    SortedDictionary<int, TSource> buffer)
{
    TSource x;
    while(buffer.TryGetValue(nextKey, out x))
    {
        buffer.Remove(nextKey);
        nextKey++;
        observer.OnNext(x);                        
    }
}

Test Harness

If you include nuget rx-testing in a console app, the following will run given you a test harness to play with:

public static void Main()
{
    var tests = new Tests();
    tests.Test();
}

public class Tests : ReactiveTest
{
    public void Test()
    {
        var scheduler = new TestScheduler();

        var xs = scheduler.CreateColdObservable(
            OnNext(100, 0),
            OnNext(200, 2),
            OnNext(300, 1),
            OnNext(400, 4),
            OnNext(500, 5),
            OnNext(600, 3),
            OnNext(700, 7),
            OnNext(800, 8),
            OnNext(900, 9),            
            OnNext(1000, 6),
            OnNext(1100, 12),
            OnCompleted(1200, 0));

        //var results = scheduler.CreateObserver<int>();

        xs.Sort(
            keySelector: x => x,
            gapTolerance: 2,
            timeoutDuration: TimeSpan.FromTicks(200),
            scheduler: scheduler).Subscribe(Console.WriteLine);

        scheduler.Start();
    }
}

Closing comments

There's all sorts of interesting alternative approaches here. I went for this largely imperative approach because I think it's easiest to follow - but there's probably some fancy grouping shenanigans you can employ to do this to. One thing I know to be consistently true about Rx - there's always many ways to skin a cat!

I'm also not entirely comfortable with the timeout idea here - in a production system, I would want to implement some means of checking connectivity, such as a heartbeat or similar. I didn't get into this because obviously it will be application specific. Also, heartbeats have been discussed on these boards and elsewhere before (such as on my blog for example).

I had considered creating an `OrderByUntil` variant of Rxx's `OrderBy` operators that would have solved this problem in a similar way, but ultimately it just seemed strange to me. Do you think it's worth implementing or it's just a decent solution to a bad question in general? — Dave Sexton, Oct 19 '14 at 17:00

score 4 · Answer 2 · answered Oct 19 '14 at 16:58

4

Strongly consider using TCP instead if you want reliable ordering - that's what it's for; otherwise, you'll be forced to play a guessing game with UDP and sometimes you'll be wrong.

For example, imagine that you receive the following datagrams in this order: [A, B, D]

When you receive D, how long should you wait for C to arrive before pushing D?

Whatever duration you choose you may be wrong:

What if C was lost during transmission and so it will never arrive?
What if the duration you chose is too short and you end up pushing D but then receive C?

Perhaps you could choose a duration that heuristically works best, but why not just use TCP instead?

Side Note:

MessageReceived.OnNext implies that you're using a Subject<T>, which is probably unnecessary. Consider converting the async UdpClient methods into observables directly instead, or convert them by writing an async iterator via Observable.Create<T>(async (observer, cancel) => { ... }).

answered Oct 19 '14 at 16:58

Dave Sexton

2,562
1
17
26

Definitely agree with Dave regarding TCP - but the ordering problem is fun, and it does occasionally have valid applications - I came across it in a custom Tibco messaging scenario where messages we being gathered from multiple machines. +1 Dave – James World Oct 19 '14 at 17:00
Well, I definitely agree that it's fun, especially because Rx doesn't offer anything really straight forward to solve this problem out-of-the-box. Though I imagine that something like the following would work: `datagrams.Buffer(time).Scan(...).Select(...)` – Dave Sexton Oct 19 '14 at 17:06
Though that last query means that all notifications are delayed. It seems that `Observable.Create` is actually simpler. – Dave Sexton Oct 19 '14 at 17:08
Also, don't forget about WCF's many features that can help here. http://msdn.microsoft.com/en-us/magazine/cc163648.aspx for example. – James World Oct 19 '14 at 20:01