2

I am looking for an equivalent of the batch and conflate operators from Akka Streams in Project Reactor, or some combination of operators that mimic their behavior.

The idea is to aggregate upstream items when the downstream backpressures in a reduce-like manner.

Note that this is different from this question because the throttleLatest / conflate operator described there is different from the one in Akka Streams.

Some background regarding what I need this for:

I am watching a change stream on a MongoDB and for every change I run an aggregate query on the MongoDB to update some metric. When lots of changes come in, the queries can't keep up and I'm getting errors. As I only need the latest value of the aggregate query, it is fine to aggregate multiple change events and run the aggregate query less often, but I want the metric to be as up-to-date as possible so I want to avoid waiting a fixed amount of time when there is no backpressure.

The closest I could come so far is this:

changeStream
    .window(Duration.ofSeconds(1))
    .concatMap { it.reduce(setOf<String>(), { applicationNames, event -> applicationNames + event.body.sourceReference.applicationName }) }
    .concatMap { Flux.fromIterable(it) }
    .concatMap { taskRepository.findTaskCountForApplication(it) }

but this would always wait for 1 second regardless of backpressure.

What I would like is something like this:

changeStream
    .conflateWithSeed({setOf(it.body.sourceReference.applicationName)}, {applicationNames, event -> applicationNames + event.body.sourceReference.applicationName})
    .concatMap { Flux.fromIterable(it) }
    .concatMap { taskRepository.findTaskCountForApplication(it) }
lbilger
  • 304
  • 1
  • 8

1 Answers1

0

I assume you always run only 1 query at the same time - no parallel execution. My idea is to buffer elements in list(which can be easily aggregated) as long as the query is running. As soon as the query finishes, another list is executed.

I tested it on a following code:

 boolean isQueryRunning = false;
 Flux.range(0, 1000000)
                .delayElements(Duration.ofMillis(10))
                .bufferUntil(aLong -> !isQueryRunning)
                .doOnNext(integers -> isQueryRunning = true)
                .concatMap(integers-> Mono.fromCallable(() -> {
                            int sleepTime = new Random().nextInt(10000);
                            System.out.println("processing " + integers.size() + " elements. Sleep time: " + sleepTime);
                            Thread.sleep(sleepTime);
                            return "";
                        })
                                .subscribeOn(Schedulers.elastic())
                ).doOnNext(s -> isQueryRunning = false)
                .subscribe();

Which prints

processing 1 elements. Sleep time: 4585
processing 402 elements. Sleep time: 2466
processing 223 elements. Sleep time: 2613
processing 236 elements. Sleep time: 5172
processing 465 elements. Sleep time: 8682
processing 787 elements. Sleep time: 6780

Its clearly visible, that size of the next batch is proprortional to previous query execution time(Sleep time).

Note that it is not "real" backpressure solution, just a workaround. Also its not suited for parallel execution. It might also require some tuning in order to prevent running queries for empty batches.

arap
  • 499
  • 2
  • 6