2

I'm trying to implement a cache using a ReplaySubject like follows, but I'm unable to solve the situation using Rx. See code and accompanying tests. The trouble is that the cache drops the newest entries and preserves the oldest.

public static class RxExtensions
{
    /// <summary>
    /// A cache that keeps distinct elements where the elements are replaced by the latest. Upon subscription the subscriber should receive the full cache contents.
    /// </summary>
    /// <typeparam name="T">The type of the result</typeparam>
    /// <typeparam name="TKey">The type of the selector key for distinct results.</typeparam>
    /// <param name="newElements">The sequence of new elements.</param>
    /// <param name="seedElements">The elements when the cache is started.</param>
    /// <param name="replacementSelector">The replacement to select distinct elements in the cache.</param>
    /// <returns>The cache contents upon first call and changes thereafter.</returns>
    public static IObservable<T> Cache<T, TKey>(this IObservable<T> newElements, IEnumerable<T> seedElements, Func<T, TKey> replacementSelector)
    {
        var replaySubject = new ReplaySubject<T>();
        seedElements.ToObservable().Concat(newElements).Subscribe(replaySubject);

        return replaySubject.Distinct(replacementSelector);
    }
}

It looks like the old ones, the seed values, would be dropped if I write the function like

newElements.Subscribe(replaySubject);
return replaySubject.Concat(seedElements.ToObservable()).Distinct(replacementSelector);

but due to how I think .Concat works, the "works" is likely just because how the test currently are, see next.

public void CacheTests()
{
    var seedElements = new List<Event>(new[]
    {
        new Event { Id = 0, Batch = 1 },
        new Event { Id = 1, Batch = 1 },
        new Event { Id = 2, Batch = 1 }
    });

    var testScheduler = new TestScheduler();
    var observer = testScheduler.CreateObserver<Event>();
    var batchTicks = TimeSpan.FromSeconds(10);
    var xs = testScheduler.CreateHotObservable
    (
        ReactiveTest.OnNext(batchTicks.Ticks, new Event { Id = 0, Batch = 2 }),
        ReactiveTest.OnNext(batchTicks.Ticks, new Event { Id = 1, Batch = 2 }),
        ReactiveTest.OnNext(batchTicks.Ticks, new Event { Id = 2, Batch = 2 }),
        ReactiveTest.OnNext(batchTicks.Ticks, new Event { Id = 3, Batch = 2 }),
        ReactiveTest.OnNext(batchTicks.Ticks, new Event { Id = 4, Batch = 2 }),
        ReactiveTest.OnNext(batchTicks.Ticks + 10, new Event { Id = 0, Batch = 3 }),
        ReactiveTest.OnNext(batchTicks.Ticks + 10, new Event { Id = 1, Batch = 3 })
    );

    var subs = xs.Cache(seedElements, i => i.Id).Subscribe(observer);
    var seedElementsAndNoMore = observer.Messages.ToArray();
    Assert.IsTrue(observer.Messages.Count == 3);

    testScheduler.Start();
    var seedAndReplacedElements = observer.Messages.ToArray();

    //OK, a bad assert, we should create expected timings and want to check
    //also the actual batch numbers, but to get things going...
    //There should be Events with IDs { 1, 2, 3, 4, 5 } all having a batch number
    //of either 2 or 3. Also, a total of 7 (not 10) events
    //should've been observed.
    Assert.IsTrue(observer.Messages.Count == 7);
    for(int i = 0; i < seedAndReplacedElements.Length; ++i)
    {                
        Assert.IsTrue(seedAndReplacedElements[i].Value.Value.Batch > 1)             
    }
}

I think what I'd like to have is

public static IObservable<T> Cache<T, TKey>(this IObservable<T> newElements, IEnumerable<T> seedElements, Func<T, TKey> replacementSelector)
{
    var replaySubject = new ReplaySubject<T>();
    newElements.StartWith(seedElements).Distinct(replacementSelector).Subscribe(replaySubject);

    return replaySubject;           
}

but the trouble is that the seed values are there first and then Rx drops the newer values, not the seed values. Then doing the other way around (maybe using .Merge) could create a situation the seed is introduced to the observable after new values have been received, thus creating a situation where the seed values aren't actually replaced.

Veksi
  • 3,556
  • 3
  • 30
  • 69
  • I can kinda see what you're trying to do but there are a few core misunderstandings here. Firstly, Distinct doesn't work in the way you seem to be intending, Distinct will give you the *first* item received with the specified key, not the last as your assertion seems to expect (batch for ids 0, 1, 2 will always be 1). Secondly, even if it did, as this is a stream of events your observer will see messages for all events from all batches as each event is emitted to the observable. Are you trying to get the latest batch ids within a specific period? – ibebbs Nov 15 '16 at 14:07
  • I'm trying to get either the seed value or the a newer version of the seed or altogether new values observed that weren't as part of the seeds according. When the subscribed subsribes to this cache, it gets all the values held in the cache as the first thing and then after that updates as they come in. I may have made a horrible mistake in the testing arrangmenet that shows 10 items when in fact I would get the correct 7 had I subscribed the observer later. Hmm... – Veksi Nov 15 '16 at 14:14
  • Is the incoming Observable hot or cold? – Shlomo Nov 15 '16 at 15:33
  • In general it can be hot or cold, so `.Publish().RefCount()` is warranted. – Veksi Nov 15 '16 at 16:12
  • `GroupBy(i=>i.Id).Select(grp=>grp.Replay(1).Publish().Refcount())` ? – Lee Campbell Nov 16 '16 at 10:01
  • Hold on, You never subscribe late?! So why would the subscriber not get all of the values? You test is wrong. I think you need to have an early subscriber (gets all 10 value) and a late subscriber (gets only 5 values) – Lee Campbell Nov 16 '16 at 10:12
  • @Lee, The test is wrong. The second Assert should be count 5. – Shlomo Nov 16 '16 at 15:16
  • @LeeCampbell's solution works against the tests of ibebbs: Cache function can be written as follows: `return seedElements.ToObservable() .Concat(newElements) .GroupBy(i => replacementSelector) .Select(grp => grp.Replay(1).Publish().RefCoun‌​t()) .Merge();` – Shlomo Nov 16 '16 at 15:18
  • 1
    And potentially replace Select+Merge with SelectMany – Lee Campbell Nov 16 '16 at 15:51
  • @LeeCampbell right again. `return seedElements.ToObservable() .Concat(newElements) .GroupBy(i => replacementSelector) .SelectMany(grp => grp.Replay(1).Publish().RefCoun‌​t());` works as well. I feel so ninja'ed. – Shlomo Nov 16 '16 at 16:52
  • Lee, it looks like @Shlomo did the work on http://stackoverflow.com/questions/40627137/how-should-one-go-about-implementing-a-distinctlatest-and-caching-operator-in already. Does either one of you care to turn into an answer there? And yep, I actually noticed I had a problem in my setup. Lately I've had this bad habit being afternoon meetings and doing "idle programming" and then running to fetch the kids etc. – Veksi Nov 16 '16 at 20:46

2 Answers2

2

Ok, I think I have what you want. I determined your requirements mostly from the phrase:

When the subscriber subscribes to this cache, it gets all the values held in the cache as the first thing and then after that updates as they come in

I believe this is desired to have a lifetime outside of a single subscription (i.e. it should be started and subscribers can come and go as they please) and have therefore made it an IConnectableObservable (this is implicit in your code but not scoped correctly).

I have also refactored your test to show multiple subscribers (per @Shlomo's comment) as follows:

[Fact]
public void ReplayAllElements()
{
    var seedElements = new List<Event>(new[]
    {
        new Event { Id = 0, Batch = 1 },
        new Event { Id = 1, Batch = 1 },
        new Event { Id = 2, Batch = 1 }
    });

    var testScheduler = new TestScheduler();

    var xs = testScheduler.CreateHotObservable
    (
        ReactiveTest.OnNext(1, new Event { Id = 0, Batch = 2 }),
        ReactiveTest.OnNext(2, new Event { Id = 1, Batch = 2 }),
        ReactiveTest.OnNext(3, new Event { Id = 2, Batch = 2 }),
        ReactiveTest.OnNext(4, new Event { Id = 3, Batch = 2 }),
        ReactiveTest.OnNext(5, new Event { Id = 4, Batch = 2 }),    
        ReactiveTest.OnNext(6, new Event { Id = 0, Batch = 3 }),
        ReactiveTest.OnNext(7, new Event { Id = 1, Batch = 3 })
    );

    IConnectableObservable<Event> cached = xs.Cache(seedElements, i => i.Id);

    var observerA = testScheduler.CreateObserver<Event>();
    cached.Subscribe(observerA);
    cached.Connect();

    testScheduler.AdvanceTo(4);

    var observerB = testScheduler.CreateObserver<Event>();
    cached.Subscribe(observerB);

    testScheduler.AdvanceTo(7);

    var expectedA = new[]
    {
        ReactiveTest.OnNext<Event>(0, @event => @event.Id == 0 && @event.Batch == 1 ),
        ReactiveTest.OnNext<Event>(0, @event => @event.Id == 1 && @event.Batch == 1 ),
        ReactiveTest.OnNext<Event>(0, @event => @event.Id == 2 && @event.Batch == 1 ),
        ReactiveTest.OnNext<Event>(1, @event => @event.Id == 0 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(2, @event => @event.Id == 1 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(3, @event => @event.Id == 2 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(4, @event => @event.Id == 3 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(5, @event => @event.Id == 4 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(6, @event => @event.Id == 0 && @event.Batch == 3 ),
        ReactiveTest.OnNext<Event>(7, @event => @event.Id == 1 && @event.Batch == 3 )
    };

    observerA.Messages.AssertEqual(expectedA);

    var expectedB = new[]
    {
        ReactiveTest.OnNext<Event>(5, @event => @event.Id == 0 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(5, @event => @event.Id == 1 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(5, @event => @event.Id == 2 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(5, @event => @event.Id == 3 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(5, @event => @event.Id == 4 && @event.Batch == 2 ),
        ReactiveTest.OnNext<Event>(6, @event => @event.Id == 0 && @event.Batch == 3 ),
        ReactiveTest.OnNext<Event>(7, @event => @event.Id == 1 && @event.Batch == 3 )
    };

    observerB.Messages.AssertEqual(expectedB);
}

As you can see, observerA gets all the seed values and the updates whereas observerB gets only the latest value for each key and then further updates.

The code to do this is as follows:

public static class RxExtensions
{
    /// <summary>
    /// A cache that keeps distinct elements where the elements are replaced by the latest.
    /// </summary>
    /// <typeparam name="T">The type of the result</typeparam>
    /// <typeparam name="TKey">The type of the selector key for distinct results.</typeparam>
    /// <param name="newElements">The sequence of new elements.</param>
    /// <param name="seedElements">The elements when the cache is started.</param>
    /// <param name="keySelector">The replacement to select distinct elements in the cache.</param>
    /// <returns>The cache contents upon first call and changes thereafter.</returns>
    public static IConnectableObservable<T> Cache<T, TKey>(this IObservable<T> newElements, IEnumerable<T> seedElements, Func<T, TKey> keySelector)
    {
        return new Cache<TKey, T>(newElements, seedElements, keySelector);
    }
}

public class Cache<TKey, T> : IConnectableObservable<T>
{
    private class State
    {
        public ImmutableDictionary<TKey, T> Cache { get; set; }
        public T Value { get; set; }
    }

    private readonly IConnectableObservable<State> _source;
    private readonly IObservable<T> _observable;

    public Cache(IObservable<T> newElements, IEnumerable<T> seedElements, Func<T, TKey> keySelector)
    {
        var agg = new State { Cache = seedElements.ToImmutableDictionary(keySelector), Value = default(T) };

        _source = newElements
            // Use the Scan operator to update the dictionary of values based on key and use the anonymous tuple to pass this and the current item to the next operator
            .Scan(agg, (tuple, item) => new State { Cache = tuple.Cache.SetItem(keySelector(item), item), Value = item })
            // Ensure we always have at least one item
            .StartWith(agg)
            // Share this single subscription to the above with all subscribers
            .Publish();

        _observable = _source.Publish(source =>
                // ... concatting ...
                Observable.Concat(
                    // ... getting a single collection of values from the cache and flattening it to a series of values ...
                    source.Select(tuple => tuple.Cache.Values).Take(1).SelectMany(values => values),
                    // ... and the returning the values as they're emitted from the source
                    source.Select(tuple => tuple.Value)
                )
            );
    }

    public IDisposable Connect()
    {
        return _source.Connect();
    }

    public IDisposable Subscribe(IObserver<T> observer)
    {
        return _observable.Subscribe(observer);
    }
}

Was certainly an interesting question. The key to the answer was this Publish overload:

    // Summary:
    //     Returns an observable sequence that is the result of invoking the selector on
    //     a connectable observable sequence that shares a single subscription to the underlying
    //     sequence. This operator is a specialization of Multicast using a regular System.Reactive.Subjects.Subject`1.
    //
    // Parameters:
    //   source:
    //     Source sequence whose elements will be multicasted through a single shared subscription.
    //
    //   selector:
    //     Selector function which can use the multicasted source sequence as many times
    //     as needed, without causing multiple subscriptions to the source sequence. Subscribers
    //     to the given source will receive all notifications of the source from the time
    //     of the subscription on.
    //
    // Type parameters:
    //   TSource:
    //     The type of the elements in the source sequence.
    //
    //   TResult:
    //     The type of the elements in the result sequence.
    //
    // Returns:
    //     An observable sequence that contains the elements of a sequence produced by multicasting
    //     the source sequence within a selector function.
    //
    // Exceptions:
    //   T:System.ArgumentNullException:
    //     source or selector is null.
    public static IObservable<TResult> Publish<TSource, TResult>(this IObservable<TSource> source, Func<IObservable<TSource>, IObservable<TResult>> selector);

Anyway, hope it helps.

ibebbs
  • 1,963
  • 2
  • 13
  • 20
  • This certainly helps, I'm on mobile phone currently, so difficult to verify. Just to be clear, the intention is that every subscriber upon first subscription will receive the full cache contents and when staying connected, updates (or new elements) as they come in. I was actually trying to avoid using `Dictionary` and I was searching for "something elegant". I need to study this later today, thanks! :) – Veksi Nov 15 '16 at 16:24
  • Very nice. Just wondering if there's any way to do it without a class and/or `IConnectableObservable`. – Shlomo Nov 15 '16 at 18:29
  • It's possible to combine this using Publish() RefCount() but I felt the subscription shouldn't be bound to a specific observer. I actually did this first before introducing the connectableobservable. Can post if you're interested. – ibebbs Nov 15 '16 at 19:38
  • This seem to do what I was after. As came up with @Schlomo, this would perhaps be easier with a `.DistinctLatest` operator. Maybe a subject for a new try! – Veksi Nov 16 '16 at 07:12
1

This isn't an answer more a clarification of your question.

I'm struggling to understand the use case. As @ibebbs pointed out, Distinct doesn't work that way. It looks like you want something like a DistinctLatest.

Here's a marble diagram for your tests. '|' in this diagram represents subscription, not completion. Also, assuming new is a hot observable and s1 is a subscriber at roughly t=20, and s2 is a subscriber at roughly t=1:

   t: ------------0--------------10--------------------20------
seed: (10)(11)(12)---------------------------------------------
 new: ---------------------------(20)(21)(22)(23)(24)--(30)(31)
  s1:                                                  |(30)(31)(22)(23)(24)
  s2:              |(10)(11)(12)-(20)(21)(22)(23)(24)--(30)(31)

Is this what you want?


EDIT:

Answer from comments from @LeeCampbell:

public static class RxExtensions
{
    public static IObservable<T> Cache<T, TKey>(this IObservable<T> newElements, IEnumerable<T> seedElements, Func<T, TKey> replacementSelector)
    {
        return seedElements.ToObservable()
            .Concat(newElements)
            .GroupBy(i => replacementSelector)
            .SelectMany(grp => grp.Replay(1).Publish().RefCoun‌​t());
    }
}
Shlomo
  • 14,102
  • 3
  • 28
  • 43
  • I'm afraid the purpose isn't plain enough written. I'm aware the `Distinct` is the problem, since the filter is applied so the first encountered value is preserved and "new duplicates" are discard. I would like to have the opposite effect. Alas, I'm not aware of a `DistinctLatest` operator. The use case is such that the subscribers receive upon their subscription all the cached values and after that updates or new values. – Veksi Nov 15 '16 at 16:17
  • To be clear, I'm not aware of a `DistinctLatest` operator either. Just seems like that's the functionality you want: Distinctly cache the values by key, preserving the latest values, until subscription, when you unleash everything. – Shlomo Nov 15 '16 at 18:30
  • Added LeeCampbell's answer here. – Shlomo Nov 16 '16 at 17:01
  • Heh, would you care to make the one at http://stackoverflow.com/questions/40627137/how-should-one-go-about-implementing-a-distinctlatest-and-caching-operator-in too? Or would @Lee Campbell mind doing it? :) – Veksi Nov 16 '16 at 20:45