2

I have a problem that I do not know how to handle beautifully with RX. I have multiple streams that all supposedly contain the same elements However​ each stream may lose messages (UDP is involved) or be late/early compared to others. Each of these messages have a sequence number.

Now what I want to achieve is get a single stream out of all those streams, ​without duplicate and keeping the message order​. In other words, the same sequence number should not appear twice and their values only have to increase, never decrease. When a message was lost on all the streams, I'm OK with losing it (as there is another TCP mechanism involved that allows me to ask explicitly for missing messages).

I am looking to do that in RxJava, but I guess my problem is not specific to Java.

Here's a marble diagram to help visualizing what I want to achieve: marble diagram

You can see in that diagram that we are waiting for 2 on the first stream to output 3 from the second stream. Likewise, 6 is only outputted once we receive 6 from the second stream because only at that point can we know for sure that 5 will never be received by any stream.

Remy
  • 23
  • 3
  • Your marble diagram shows 2, 3, and 4 all printing out too early. They would print out in order immediately before the second 6 appears, such that 2, 3, 4, and 6 would be emitted "at the same time." – Timothy Shields Oct 22 '15 at 16:01
  • @Tim - That was my first impression, too. But I think I understand what he is trying to do now. Obviously only one of the 1s is emitted. Then the 3 is produced and cached because 2 has not been emitted. Once 2 is produced it is emitted because 1 has already been emitted at which point the cached 3 is immediately emitted. 4 is produced and emitted because 3 has already been emitted. 6 is produced and cached because 5 has not been emitted. 6 is produced again, indicating 5 is to be skipped, so 6 is emitted and removed from the cache. That is my understanding. – Jason Boyd Oct 22 '15 at 17:30
  • @JasonBoyd: yes that's exactly it :) – Remy Oct 23 '15 at 12:38

2 Answers2

2

This is browser code, but I think it should give you a good idea of how you could solve this.

public static IObservable<T> Sequenced<T>(
    this IObservable<T> source,
    Func<T, int> getSequenceNumber,
    int sequenceBegin,
    int sequenceRedundancy)
{
    return Observable.Create(observer =>
    {
        // The next sequence number in order.
        var sequenceNext = sequenceBegin;

        // The key is the sequence number.
        // The value is (T, Count).
        var counts = new SortedDictionary<int, Tuple<T, int>>();
        return source.Subscribe(
            value =>
            {
                var sequenceNumber = getSequenceNumber(value);

                // If the sequence number for the current value is
                // earlier in the sequence, just throw away this value.
                if (sequenceNumber < sequenceNext)
                {
                    return;
                }

                // Update counts based on the current value.
                Tuple<T, int> count;
                if (!counts.TryGetValue(sequenceNumber, out count))
                {
                    count = Tuple.Create(value, 0);
                }
                count = Tuple.Create(count.Item1, count.Item2 + 1);
                counts[sequenceNumber] = count;

                // If the current count has reached sequenceRedundancy,
                // that means any seqeunce values S such that
                // sequenceNext < S < sequenceNumber and S has not been
                // seen yet will never be seen. So we emit everything
                // we have seen up to this point, in order.
                if (count.Item2 >= sequenceRedundancy)
                {
                    var removal = counts.Keys
                        .TakeWhile(seq => seq <= sequenceNumber)
                        .ToList();
                    foreach (var seq in removal)
                    {
                        count = counts[seq];
                        observer.OnNext(count.Item1);
                        counts.Remove(seq);
                    }
                    sequenceNext++;
                }

                // Emit stored values as long as we keep having the
                // next sequence value.
                while (counts.TryGetValue(sequenceNext, out count))
                {
                    observer.OnNext(count.Item1);
                    counts.Remove(sequenceNext);
                    sequenceNext++;
                }
            },
            observer.OnError,
            () =>
            {
                // Emit in order any remaining values.
                foreach (var count in counts.Values)
                {
                    observer.OnNext(count.Item1);
                }
                observer.OnCompleted();
            });
    });
}

If you have two streams IObservable<Message> A and IObservable<Message> B, you would use this by doing Observable.Merge(A, B).Sequenced(msg => msg.SequenceNumber, 1, 2).

For your example marble diagram, this would look like the following, where the source column shows the values emitted by Observable.Merge(A, B) and the counts column shows the contents of the SortedDictionary after each step of the algorithm. I am assuming that the "messages" of the original source sequence (without any lost values) is (A,1), (B,2), (C,3), (D,4), (E,5), (F,6) where the second component of each message is its sequence number.

source | counts
-------|-----------
 (A,1) | --> emit A
 (A,1) | --> skip
 (C,3) | (3,(C,1))
 (B,2) | (3,(C,1)) --> emit B,C and remove C
 (D,4) | --> emit D
 (F,6) | (6,(F,1))
 (F,6) | (6,(F,2)) --> emit F and remove
Timothy Shields
  • 75,459
  • 18
  • 120
  • 173
  • This approach requires that the stream completes before any values are emitted. – akarnokd Oct 22 '15 at 15:58
  • @akarnokd Absolutely not. Read the code. It will perform "skips" whenever it detects a next-expected or future value that has reached a count equal to the `sequenceRedundancy` - which is exactly the point at which you can prove that you will never receive the skipped values, even if you keep waiting. – Timothy Shields Oct 22 '15 at 15:59
  • You are right. I've missed an OnNext in the value lambda. – akarnokd Oct 22 '15 at 16:00
  • I think there's still something missing though - but I'm sure the description of my problem was not specific enough to guess it. If we focus on message (F,6), the algorithm actually relies on it being received on both streams. What could happen though, is that (F,6) is received on the first stream but lost on the second. That means maybe we would only get (F,7) on the second one. In such a case, I believe the algorithm would not work. Kudos though the general direction of the proposal! – Remy Oct 23 '15 at 13:33
  • @Remy why would the same message have different sequence numbers? – Timothy Shields Oct 23 '15 at 15:08
0

A similar question came up a while ago and I have a custom merge operator that when given ordered streams, it merges them in order but doesn't do deduplication.

Edit:

If you can "afford" it, you can use this custom merge and then distinctUntilChanged(Func1) to filter out subsequent messages with the same sequence number.

Observable<Message> messages = SortedMerge.create(
    Arrays.asList(src1, src2, src3), (a, b) -> Long.compare(a.id, b.id))
.distinctUntilChanged(v -> v.id);
akarnokd
  • 69,132
  • 14
  • 157
  • 192
  • That looks very nice, I like the magic of operators - that's one of the things that convinced me RX was nice. Did you consider making such operators available in a library? That would probably help to afford using it :) (re-usable, tested by many, etc) Thanks for your input! – Remy Oct 23 '15 at 13:42