0

I have a Observable<Snapshot> stream that when subscribed to, replays the event log of every Snapshot produced for every live entity (and any new Snapshot's after the subscription). This replay can contain multiple Snapshots for the same entity. An entity is considered live if its had a snapshot in the last 24 hours.

I am trying to setup a hot observable, that when subscribed to, will replay only the latest Snapshot for a given Entity and any new snapshots after the subsription for display in a table in the UI.

Here is the code I have:

  Observable<Snapshot> snapshots = getContinuousSnapshotsStream();
  
  var cache = snapshots.groupBy(Snapshot::getId)
    .map(g -> {
        var o = g.timeout(1, TimeUnit.DAYS)
          .onErrorComplete()
          .replay(1);
        o.connect();
        return o.hide();
    })
    .replay(); // memory leak, holds onto references to groups that have timed out already

  // start cache
  cache.connect();
  
  // each client UI subscription will flatten this
  cache.flatMap(o -> o);

As you can see from the comment, the reply on the cache will hold onto groups that have already timed out. I need a way to, onComplete() of o, to remove it from the replay.

Are there any RX operators I can use to achieve my goal without managing a separate cache myself?

Cheetah
  • 13,785
  • 31
  • 106
  • 190
  • For one, there is a replay overload that eagerly removes old entries to avoid leaks: http://reactivex.io/RxJava/3.x/javadoc/io/reactivex/rxjava3/core/Observable.html#replay-int-boolean- – akarnokd Oct 13 '22 at 14:08
  • @akarnokd - the issue is that the completed observable won't be marked for removal in the first place... – Cheetah Oct 13 '22 at 15:08
  • I don't think this can be done with existing operators. You could get rid of the cached group's item by materializing after `onErrorComplete` so the `replay(1, true)` holds onto a value or the completion indicator. However, removing the entire timed out group from the outer `replay` is not possible. – akarnokd Oct 15 '22 at 06:37

2 Answers2

0

Unfortunately, this can't be achieved with standard operators. You'd have to write a custom operator, which can be approximated by using some standard operators and manual internal state management. This is what I came up with:

record MessageValue<T>(T value) { }
record MessageSubscriber<T>(Emitter<T> emitter) { }
record MessageDisposed<T>(Emitter<T> emitter) { }
record MessageTimeout<K>(K key, long id) { }
record CachedValue<T>(T value, long id, Disposable timeout) { }

public static <T, K> Observable<T> groupCacheLatestWithTimeout(
        Observable<T> source, 
        long timeout, TimeUnit unit, Scheduler scheduler,
        Function<T, K> keySelector) {
    Subject<Object> messageQueue = PublishSubject.create().toSerialized(); 
    
    Map<K, CachedValue<T>> cache = new LinkedHashMap<>();
    List<Emitter<T>> emitters = new ArrayList<>();
    
    AtomicLong idGenerator = new AtomicLong();

    var result = Observable.<T>create(emitter -> {
        messageQueue.onNext(new MessageSubscriber<T>(emitter));
        emitter.setCancellable(() -> {
            messageQueue.onNext(new MessageDisposed<T>(emitter));
        });
    });
    
    messageQueue.subscribe(message -> {
        if (message instanceof MessageValue) {
            var mv = ((MessageValue<T>)message).value;
            var key = keySelector.apply(mv);
            
            var old = cache.get(key);
            if (old != null) {
                old.timeout.dispose();
            }
            
            var id = idGenerator.incrementAndGet();
            
            var dispose = scheduler.scheduleDirect(() -> {
                messageQueue.onNext(new MessageTimeout<K>(key, id));
            });
            
            cache.put(key, new CachedValue<T>(mv, id, dispose));
            
            for (var emitter : emitters) {
                emitter.onNext(mv);
            }
        }
        else if (message instanceof MessageSubscriber) {
            
            var me = ((MessageSubscriber<T>)message).emitter;
            emitters.add(me);
            
            for (var entry : cache.values()) {
                me.onNext(entry.value);
            }
        }
        else if (message instanceof MessageDisposed) {
            var md = ((MessageDisposed<T>)message).emitter;
            emitters.remove(md);
        }
        else if (message instanceof MessageTimeout) {
            var mt = ((MessageTimeout<K>)message);
            
            var entry = cache.get(mt.key);
            if (entry.id == mt.id) {
                cache.remove(mt.key);
            }
        }
    });
    
    source.subscribe(value -> {
        messageQueue.onNext(new MessageValue<>(value));
    });

    return result;
}

What needs to happen is to create an event loop and serialize interactions with the cache: cache the latest upstream item, timeout the cache entry, manage new subscribers and replays, remove subscribers.

You can test it like this:

var subject = PublishSubject.<String>create();
var sched = new TestScheduler();
var output = groupCacheLatestWithTimeout(subject, 
        5, TimeUnit.SECONDS, sched, 
        k -> k.substring(0, 2));

var to1 = output.test();

to1.assertEmpty();

subject.onNext("g1-1");

to1.assertValuesOnly("g1-1");

subject.onNext("g1-2");

to1.assertValuesOnly("g1-1", "g1-2");

var to2 = output.test();

to2.assertValuesOnly("g1-2");

sched.advanceTimeBy(10, TimeUnit.SECONDS);

to1.assertValuesOnly("g1-1", "g1-2");
to2.assertValuesOnly("g1-2");

var to3 = output.test();

to3.assertEmpty();

subject.onNext("g1-3");

to1.assertValuesOnly("g1-1", "g1-2", "g1-3");
to2.assertValuesOnly("g1-2", "g1-3");
to3.assertValuesOnly("g1-3");

to1.dispose();

subject.onNext("g1-4");

to1.assertValuesOnly("g1-1", "g1-2", "g1-3");
to2.assertValuesOnly("g1-2", "g1-3", "g1-4");
to3.assertValuesOnly("g1-3", "g1-4");
akarnokd
  • 69,132
  • 14
  • 157
  • 192
0

This is the solution I went with. Whilst I think its subjectively simpler than what @akarnkd replied with - I would not have been able to come up with it without having read that answer first.

Observable<Snapshot> snapshots = getContinuousSnapshotsStream();

var scheduler = Schedulers.single(); // synchronizer to ensure no race condition where we miss events.
var cache = new LinkedHashMap<String, Observable<Snapshot>>();
var events = snapshots.groupBy(Snapshot::getId)
    .observeOn(scheduler)
    .map(g -> {
        var o = g.timeout(1, TimeUnit.DAYS)
          .doFinally(() -> scheduler.scheduleDirect(() -> cache.remove(g.getKey())))
          .replay(1);
        cache.put(g.getKey(), o);
        return o.autoConnect(-1);
    })
    .publish()
    .autoConnect(-1);

// each client UI subscription would subscribe to this:
var sub = Observable.merge(events, Observable.fromIterable(cache.values()))
    .subscribeOn(scheduler)
    .flatmap(r -> r);
Cheetah
  • 13,785
  • 31
  • 106
  • 190