I am new to the world of spring-flux
and project-reactor
so I am not aware of any pattern out-of-the-box that you can solve your problem. However, you can create your own pattern to limit the number of groups that you create using the groupBy
operator.
In the example bellow I used the pattern of int partition = i % numberOfPartitions;
inspired by this blog post of Apache Flink that decides the number of partitions to split a stream.
public Flux<GroupedFlux<Integer, Data>> createFluxUsingGroupBy(List<String> dataList, int numberOfPartitions, int maxCount) {
return Flux
.fromStream(IntStream.range(0, maxCount)
.mapToObj(i -> {
int randomPosition = ThreadLocalRandom.current().nextInt(0, dataList.size());
int partition = i % numberOfPartitions;
return new Data(i, dataList.get(randomPosition), partition);
})
)
.delayElements(Duration.ofMillis(10))
.log()
.groupBy(Data::getPartition);
}
........
@lombok.Data
@AllArgsConstructor
@NoArgsConstructor
public class Data {
private Integer key;
private String value;
private Integer partition;
}
When I execute it using numberOfPartitions = 3
I will have partitions from 0 to 2 (3 partitions) regardless the key that I am using.
@Test
void testFluxUsingGroupBy() {
int numberOfPartitions = 3;
int maxCount = 100;
Flux<GroupedFlux<Integer, Data>> dataGroupedFlux = fluxAndMonoTransformations.createFluxUsingGroupBy(expect, numberOfPartitions, maxCount);
StepVerifier.create(dataGroupedFlux)
.expectNextCount(numberOfPartitions)
.verifyComplete();
}
here is the log:
10:43:02.168 [Test worker] INFO reactor.Flux.ConcatMap.1 - onSubscribe(FluxConcatMap.ConcatMapImmediate)
10:43:02.179 [Test worker] INFO reactor.Flux.ConcatMap.1 - request(256)
10:43:02.291 [parallel-1] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=0, value=Spring, partition=0))
10:43:02.362 [parallel-1] INFO reactor.Flux.ConcatMap.1 - request(1)
10:43:02.375 [parallel-2] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=1, value=Scala, partition=1))
10:43:02.377 [parallel-2] INFO reactor.Flux.ConcatMap.1 - request(1)
10:43:02.388 [parallel-3] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=2, value=reactive programming, partition=2))
10:43:02.389 [parallel-3] INFO reactor.Flux.ConcatMap.1 - request(1)
10:43:02.400 [parallel-4] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=3, value=java with lambda, partition=0))
10:43:02.411 [parallel-1] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=4, value=Spring, partition=1))
10:43:02.422 [parallel-2] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=5, value=java 8, partition=2))
10:43:02.433 [parallel-3] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=6, value=java with lambda, partition=0))
10:43:02.444 [parallel-4] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=7, value=java with lambda, partition=1))
...
To enhance this solution in case there is no private Integer key;
available on the Data
object I can generate the partition based on a hash. I used another parameter that is the parallelism
. It is basically for a restore operation if you save the values on a storage using a parallelism of X
and when you afterwards read the same values but using a different parallelism != X
you can preserve the values on the same group. So I used int partition = (getDifferentHashCode(value) * parallelism) % numberOfPartitions;
which is also inspired by the blog post that I mentioned. I prefer this approach.
public Flux<GroupedFlux<Integer, Data>> createFluxUsingHashGroupBy(List<String> dataList, int numberOfPartitions, int parallelism, int maxCount) {
return Flux
.fromStream(IntStream.range(0, maxCount)
.mapToObj(i -> {
int randomPosition = ThreadLocalRandom.current().nextInt(0, dataList.size());
String value = dataList.get(randomPosition);
int partition = (getDifferentHashCode(value) * parallelism) % numberOfPartitions;
return new Data(i, value, partition);
})
)
.delayElements(Duration.ofMillis(10))
.log()
.groupBy(Data::getPartition);
}
public int getDifferentHashCode(String value) {
int hash = 7;
for (int i = 0; i < value.length(); i++) {
hash = hash * 31 + value.charAt(i);
}
return hash;
}
unit test:
@Test
void testFluxUsingHashGroupBy() {
int numberOfPartitions = 3;
int parallelism = 2;
int maxCount = 100;
Flux<GroupedFlux<Integer, Data>> dataGroupedFlux = fluxAndMonoTransformations.createFluxUsingHashGroupBy(expect, numberOfPartitions, parallelism, maxCount);
StepVerifier.create(dataGroupedFlux)
.expectNextCount(numberOfPartitions)
.verifyComplete();
}
Regarding the backpressure questions, I think it could come in another SO question.