3

I have an stream of objects which each contain a timestamp. I want to partition this stream into non-overlapping windows, in a very similar fashion to Observable.Buffer or Observable.Window. However I want the window or buffer to close when the timestamp of my object exceeds a certain threshold, rather than when a real-time threshold is exceeded.

For example, suppose I want to partition the data into 30 second windows/buffers and my first object has a timestamp of 00:00:00. When I reach an object whose timestamp exceeds 00:00:30 I want the window/buffer to close and start the cycle again. In this way my objects will be grouped into appropriate buckets based on their timestamp.

The current operators Buffer and Window are very close to what I need but not exactly. For example, if I did something like this:

MySource.Window(TimeSpan.FromSeconds(30))

Then I get all of the objects that my subscription encounters within a 30 second window. The problem with this is that the data is partitioned into a real-time 30 second window, rather than a window based on the timestamp of the object itself.

I guess this requires me to implement an appropriate windowClosingSelector Func, but I'm having difficulty getting this to work.

James World
  • 29,019
  • 9
  • 86
  • 120
JMc
  • 971
  • 2
  • 17
  • 26

1 Answers1

4

Yes, you can do this quite easily.

EDIT - I've revised the solution to use a selector function to obtain the timestamp ticks - this allows for any source element type to be supplied.

Any solution obviously needs to know how to get at the timestamp to work with it. I imagine you have your own element type in mind, but let's assume for argument your elements are typed as System.Reactive.Timestamped<int>. The test example uses this type, but the solution below will work with any type from which you can obtain the ticks of the timestamp.

The following extension method will create windows according to the supplied windowDuration.

public static IObservable<IGroupedObservable<long, TSource>> WindowByTimestamp
    <TSource>(
    this IObservable<TSource> source,
    Func<TSource, long> timestampTicksSelector,
    TimeSpan windowDuration)
{
    long durationTicks = windowDuration.Ticks;

    return source.Publish(ps => 
        ps.GroupByUntil(x => timestampTicksSelector(x) / durationTicks,
                        g => ps.Where(
                        x => timestampTicksSelector(x) / durationTicks != g.Key)));
}

How it works

The trick is to see that this is basically a grouping operation. We create a group key based on the integer division of our window duration into the element timestamp. We use the ticks of the duration or and timestamp for convenience.

The timestamp ticks are obtained via a supplied selector function.

Now, to follow Window behaviour, we must expect the timestamps to form a monotonically increasing sequence - that is, each timestamp is equal to or later than the preceding one. You should probably check this constraint and make it an error* (see note on this later, and the additional code at the end).

So in order to accomplish this, we must close each group when a new group starts. With the increasing behaviour assumed, all we need to do is use GroupByUntil's duration function to check when an element with a new key appears - this will close the group. There will therefore only ever be one active group, that for the current window.

Note

*If your timestamps are out of order, then you can just use GroupBy. You won't need the publish mechanism or the duration function of GroupByUntil - but note all groups will only complete when the source stream completes. You can then use the group key to report the window.

On a related note, note the return type of WindowByTimestamp is IObservable<IGroupedObservable<long,TSource>> where the long is the key type - this gives you access to the Key property in subsequent operations. In the test below, I used the indexer of SelectMany to create a window number, but using the Key property gives you more flexibility since the key can be anything you like as long as it distinguishes windows. In this case it would be an increasing sequence starting at a fairly arbitrary looking number. It will be the number of times the duration tick count divides into the timestamp tick count. Note since windows can be empty, the step sizes will vary too.

Test

Here's a test for you to see it working - to be able to use this you'll need to include the nuget package rx-testing:

public class Tests : ReactiveTest
{
    public void Scenario1()
    {
        var scheduler = new TestScheduler();
        var live = scheduler.CreateHotObservable<Timestamped<int>>(
          OnNext(100, Timestamped.Create(1, new DateTimeOffset(100, TimeSpan.Zero))),
          OnNext(101, Timestamped.Create(1, new DateTimeOffset(200, TimeSpan.Zero))),
          OnNext(102, Timestamped.Create(2, new DateTimeOffset(300, TimeSpan.Zero))),
          OnNext(103, Timestamped.Create(2, new DateTimeOffset(400, TimeSpan.Zero))),
          OnNext(104, Timestamped.Create(3, new DateTimeOffset(450, TimeSpan.Zero))),
          OnNext(105, Timestamped.Create(3, new DateTimeOffset(455, TimeSpan.Zero))),
          OnCompleted<Timestamped<int>>(105)
        );            

        var windows = live.WindowByTimestamp(
                               x => x.Timestamp.Ticks,
                               TimeSpan.FromTicks(200));

        var numberedWindows = windows.SelectMany((x,i) =>
            x.Select(y => new {
                WindowNumber = i,
                Timestamp = y.Timestamp,
                Value = y.Value }));

        numberedWindows.Subscribe(x => Console.WriteLine(
            "Window: {0}, Time: {1} Value: {2}",
            x.WindowNumber,  x.Timestamp.Ticks, x.Value));

        scheduler.Start();        
    }
}

The output is:

Window: 0, Time: 100 Value: 1
Window: 1, Time: 200 Value: 1
Window: 1, Time: 300 Value: 2
Window: 2, Time: 400 Value: 2
Window: 2, Time: 450 Value: 3
Window: 2, Time: 455 Value: 3

Checking for out-of-order timestamps

Finally, here is an example of one way you might want to check for the non-decreasing timestamp constraint:

    public static IObservable<TSource> EnsureNonDecreasing
        <TSource, TComparedProperty>(
        this IObservable<TSource> source,
        Func<TSource, TComparedProperty> comparedPropertySelector)
        where TComparedProperty : IComparable<TComparedProperty>
    {
         return Observable.Create((IObserver<TSource> o) => {
            bool started = false;
            var last = default(TComparedProperty);

            return source.Subscribe(x => {
                var current = comparedPropertySelector(x);
                if(started && current.CompareTo(last) < 0)
                {
                    // you might want to provide more info here,
                    // such as the offending element
                    o.OnError(new InvalidDataException(
                        "Source contained a decreasing element."));
                    return;
                }                    
                started = true;
                last = current;
                o.OnNext(x);
            },
            ex => o.OnError(ex),
            () => o.OnCompleted());                
        });                
    }

To test this, alter the test above to include an out-of-order DateTimeOffset, amend the assignment of the windows variable to include the check and update the Subscribe call to print out the error:

var windows = live.EnsureNonDecreasing(x => x.Timestamp) // added this operator
                  .WindowByTimestamp(
                      x => x.Timestamp.Ticks,
                      TimeSpan.FromTicks(200));

and:

numberedWindows.Subscribe(x => Console.WriteLine(
    "Window: {0}, Time: {1} Value: {2}",
    x.WindowNumber,  x.Timestamp.Ticks, x.Value),
    ex => Console.WriteLine(ex.Message)); // added this line
James World
  • 29,019
  • 9
  • 86
  • 120
  • @JMc Added an example test, and fixed an important typo - I had a modulo division operator % in one spot where I should have had an integer division operator /. – James World Dec 16 '14 at 11:26
  • Great response thanks. Need some time to ponder this. – JMc Dec 16 '14 at 11:45
  • @JMc Added an example of how to check for ordered timestamps. – James World Dec 16 '14 at 12:29
  • @JMc Made another refinement; removed dependency on the `Timestamped` type and replaced with a selector function to obtain the ticks of the timestamp of each element. This makes the solution much more reusable. – James World Dec 16 '14 at 14:45