4

I'm writing some kind of middleware HTTP proxy with cache. The workflow is:

  1. Client requests this proxy for resource
  2. If resurce exists in cache, proxy returns it
  3. If resource wasn't found, proxy fetching remote resource and returns to the user. Proxy saves this resource to the cache on data loading.

My interfaces have Publisher<ByteBuffer> stream for remote resource, cache which accepts Publisher<ByteBuffer> to save, and clients' connection which accepts Publisher<ByteBuffer> as a response:

// remote resource
interface Resource {
  Publisher<ByteBuffer> fetch();
}

// cache
interface Cache {
  Completable save(Publisher<ByteBuffer> data);
}

// clien response connection
interface Connection {
  Completable send(Publisher<ByteBuffer> data);
}

My problem is that I need to lazy save this stream of byte buffers to cache when sending the response to the client, so the client should be responsible for requesting ByteByffer chunks from remote resource, not cache.

I tried to use Publisher::cache method, but it's not a good choice for me, because it keeps all received data in memory, it's not acceptable, since cached data may be few GB of size.

As a workaround, I created Subject filled by next items received from Resource:

private final Cache cache;
private final Connection out;

Completable proxy(Resource res) {
  Subject<ByteBuffer> mirror = PublishSUbject.create();
  return Completable.mergeArray(
    out.send(res.fetch().doOnNext(mirror::onNext),
    cache.save(mirror.toFlowable(BackpressureStrategy.BUFFER))
  );
}

Is it possible to reuse same Publisher without caching items in memory, and where only one subscriber will be responsible for requesting items from publisher?

Kirill
  • 7,580
  • 6
  • 44
  • 95
  • I can't understand you when you say : *so the client should be responsible for requesting ByteByffer chunks from remote resource, not cache.* ! – bubbles Jun 17 '20 at 10:52
  • @bubbles I mean the back-pressure: `Subscription` of `Publisher` has method `request(long n)` which is requesting next `n` amount of items from `Publisher`. The client's `Connection` is slower than `Cache`, so only `Connection` should be responsible for requesting next `n` amount of `ByteBuffer` from remote `Publisher` of `Resource` – Kirill Jun 17 '20 at 12:04
  • What version of RxJava is this? I'm on 2.x and `Publisher` only has one method: `Publisher.subscribe(Subscriber super T>)` – TrogDor Jun 25 '20 at 14:52

1 Answers1

1

I might be missing something (added comment about my version of the Publisher interface being different).

But.. here's how I would do something like this conceptually.

I'm going to simplify the interfaces to deal with Integers:

// remote resource
interface Resource {
  ConnectableObservable<Integer> fetch();
}

// cache
interface Cache {
  Completable save(Integer data);
}

// client response connection
interface Connection {
  Completable send(Integer data);
}

I'd use Observable::publish to create a ConnectableObservable and establish two subscriptions:

@Test
public void testProxy()
{
    // Override schedulers:
    TestScheduler s = new TestScheduler();
    
    RxJavaPlugins.setIoSchedulerHandler(
            scheduler -> s );
    RxJavaPlugins.setComputationSchedulerHandler(
            scheduler -> s );
    
    // Mock interfaces:
    Resource resource = () -> Observable.range( 1, 100 )
            .publish();
    
    Cache cache = data -> Completable.fromObservable( Observable.just( data )
                .delay( 100, TimeUnit.MILLISECONDS )
                .doOnNext( __ -> System.out.println( String.format( "Caching %d", data ))));
    
    Connection connection = data -> Completable.fromObservable( Observable.just( data )
                .delay( 500, TimeUnit.MILLISECONDS )
                .doOnNext( __ -> System.out.println( String.format( "Sending %d", data ))));
    
    // Subscribe to resource:
    ConnectableObservable<Integer> observable = resource.fetch();
    
    observable
        .observeOn( Schedulers.io() )
        .concatMapCompletable( data -> connection.send( data ))
        .subscribe();
    
    observable
        .observeOn( Schedulers.computation() )
        .concatMapCompletable( data -> cache.save( data ))
        .subscribe();
    
    observable.connect();
    
    // Simulate passage of time:
    s.advanceTimeBy( 10, TimeUnit.SECONDS );
}

Output:

Caching 1
Caching 2
Caching 3
Caching 4
Sending 1
Caching 5
Caching 6
Caching 7
Caching 8
Caching 9
Sending 2
Caching 10
. . . 

Update

Based on your comments, it sounds like respecting backpressure is important in your case.

Let's say you have a Publisher somewhere that honors backpressure, you can transform it into a Flowable as follows:

Flowable<T> flowable = Flowable.fromPublisher( publisher );

Once you have a Flowable you can allow for multiple subscribers without worrying about each subscriber having to request values from the Publisher (or either subscriber from missing any events while establishing the subscriptions). You do that by calling flowable.publish() to create a ConnectableFlowable.

enter image description here

ConnectableFlowable<T> flowable = Flowable.fromPublisher( publisher ).publish();
out.send(flowable);   // calls flowable.subscribe()
cache.save(flowable); // calls flowable.subscribe()
flowable.connect();   // begins emitting values
TrogDor
  • 984
  • 6
  • 14
  • Thanks for the answer. I'm using RxJava2, the `Publicher` interfaces are the part of reactivestreams `1.0.0`, see https://www.reactive-streams.org/reactive-streams-1.0.0-javadoc/org/reactivestreams/Publisher.html Actually there are some mismatches with your interfaces, since `Cache` and `Connection` accepts `Publisher` (let's use `T` instead of integers and buffers), not plain integers. So the problem is both `Subscriber`s of `Publisher` may request next items from the `Subscription` after `onSubscribe()` call – Kirill Jun 29 '20 at 09:50
  • @Kirill in your link, `Publisher` has one method: `Publisher::subscribe`. Your question references a method `Publisher::cache` and your code seems to reference `Publisher::doOnNext`. I can't find either of those. – TrogDor Jun 30 '20 at 19:32
  • In my origin comment to question I told about `Subscription` of `Publisher`, not `Publisher` itself, it has method `request(long)`: https://www.reactive-streams.org/reactive-streams-1.0.0-javadoc/org/reactivestreams/Subscription.html#request-long- Also, I've mentioned in question that back pressure is important part, and `Cache` is much faster as a consumer, so it fills out the buffer of `ConnectableFlowable` and the memory will be full of items before sending to slow `Connection`, this is the main problem (see origin question). – Kirill Jul 01 '20 at 07:50
  • @Kirill `ConnectableFlowable` respects backpressure from the slowest consumer and you can control the size of its buffer by calling `Flowable.publish(int bufferSize)`. Assuming your underlying resource/publisher respects backpressure as well, there should be no memory consumption issues. – TrogDor Jul 01 '20 at 13:06
  • Thanks, I've just verified, that connectable publisher respects the back-pressure from the slowest consumer. – Kirill Jul 02 '20 at 14:38