Parallel GET Request to specific mapping with WebFlux

Question

I want to call independent request simultaneously with WebClient. My previous approch with RestTemplate was blocking my threads while waiting for the response. So I figured out, that WebClient with ParallelFlux could use one thread more efficient because it is supposed to schedule multiple requests with one thread.

My endpoint requests an tupel of id and a location.

The fooFlux method will be called a few thousand times in a loop with different parameters. The returned map will be asserted against stored reference values.

Previous attemps ofthis code resulted in duplicated API calls. But there is still a flaw. The size of the keyset of mapping is often less than the size of Set<String> location. In fact, the size of the resulting map is changing. Furthermore it is correct every now and then. So there might be an issue with the subscripton finishing after the method has returned the map.

public Map<String, ServiceDescription> fooFlux(String id, Set<String> locations) {
    Map<String, ServiceDescription> mapping = new HashMap<>();
    Flux.fromIterable(locations).parallel().runOn(Schedulers.boundedElastic()).flatMap(location -> {
        Mono<ServiceDescription> sdMono = getServiceDescription(id, location);
        Mono<Mono<ServiceDescription>> sdMonoMono = sdMono.flatMap(item -> {
            mapping.put(location, item);
            return Mono.just(sdMono);
        });
        return sdMonoMono;
    }).then().block();
    LOGGER.debug("Input Location size: {}", locations.size());
    LOGGER.debug("Output Location in map: {}", mapping.keySet().size());
    return mapping;
}

Handle Get-Request

private Mono<ServiceDescription> getServiceDescription(String id, String location) {
    String uri = URL_BASE.concat(location).concat("/detail?q=").concat(id);
    Mono<ServiceDescription> serviceDescription =
                    webClient.get().uri(uri).retrieve().onStatus(HttpStatus::isError, clientResponse -> {
                        LOGGER.error("Error while calling endpoint {} with status code {}", uri,
                                        clientResponse.statusCode());
                        throw new RuntimeException("Error while calling Endpoint");
                    }).bodyToMono(ServiceDescription.class).retryBackoff(5, Duration.ofSeconds(15));
    return serviceDescription;
}

why do you use `JsonNode.class` and not serialize/deserialize into a concrete object? and why use reactive programming for something that can be solved using the `@Async`. Reactive programming is not async-programming. They are two different things that complement each other. — Toerktumlare, Feb 29 '20 at 10:03
I used `JsonNode.class` because the recieved JSON-Model is huge and I just need a tiny bit of it. I came up with reactive programming because of a baeldung article (https://www.baeldung.com/spring-webclient-resttemplate). I want to archive a gain in download speed. The `RestTemplate` approch within ParallelStreams got me between 10Mbit/s and 200 Mbit/s Download on my network-card. Depending on the amount of locations per id. But this varies from 1 to ~4000 — froehli, Feb 29 '20 at 10:22
I've read your answer at Aug 5 '19 about RestTemplate and WebClient. I don't use react programming anywhere else. But my endpoint has a load-balancer and I can spawn pods to gain the needed threads on my backend. The whole point of my approach is to validate the data on the backend once every now and then. — froehli, Feb 29 '20 at 10:47
RestClients dont affect download speeds. Network bandwidth affect speeds. So what client you use will not affect any download speed. And if you only need a little piece, who says you need to declare the entire object? just create a class with the little piece you need. Yes use WebClient, but you need to know the difference between reactive programming and concurrent programming. — Toerktumlare, Feb 29 '20 at 12:22
Reactive programming as all about not blocking, and utilising threads as much as possible to do both serial and parallell tasks. While concurrent programming is to do what you want to do, spawn threads and fetch things concurrently. Reactive programming can do things in serial and concurrently to solve the task you give them. But it is not used predominantly to do asynchronous tasks. — Toerktumlare, Feb 29 '20 at 12:24
I will try to declare a subset of the model. Never thought about that. Thanks! I rephrased my question in the original post. Don't you think that this approach can improve the performance to gain all information since my threads are not blocked while waiting for network IO? Could you take a look at my two calls of `block()`. I think that the first one could be spared. — froehli, Feb 29 '20 at 13:39
well if you need to gather upp all results and get the concrete values to return to the calling client in one big blob, and not stream results to a calling client, then in your application that is not reactive, block needs to be used. But as you have done, placed on its own scheduler. But i would suggest using a boundedElastic scheduler, the one you have chosen will use up threads into infinity and in worst case scenario run into thread starvation and crash the application. — Toerktumlare, Feb 29 '20 at 14:03
Great I will get right to it. So changing the scheduler and the json serialization and I am good to go. Thanks for your help! And think that I need the whole blog because I am validating it in a JUnit-Test against values from another source. — froehli, Feb 29 '20 at 14:10
I still wonder, that I block the `Mono` in my `flatMap` function and put it into a map and still return same `Mono`. See counter intuitive. — froehli, Feb 29 '20 at 15:08
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/208779/discussion-between-froehli-and-thomas-andolf). — froehli, Feb 29 '20 at 18:51

score 3 · Accepted Answer · answered Mar 01 '20 at 22:19

3

public Map<String, ServiceDescription> fooFlux(String id, Set<String> locations) {
    return Flux.fromIterable(locations)
               .flatMap(location -> getServiceDescription(id, location).map(sd -> Tuples.of(location, sd)))
               .collectMap(Tuple2::getT1, Tuple2::getT2)
               .block();
}

Note: flatMap operator combined with WebClient call gives you concurrent execution, so there is no need to use ParallelFlux or any Scheduler.

answered Mar 01 '20 at 22:19

Martin Tarjányi

8,863
2
31
49

Thanks Martin, I think that was the point. I could not figure out to find or correct use the `collectMap` method in that context. Could you elaborate why you would not use a `ParallelFlux`with a `Scheduler`? I thought that the shared-threadpool of the worker-units would speed in I/O situations. – froehli Mar 02 '20 at 07:42
2

As far as I understand, `ParallelFlux` is intended for CPU intensive work. Concurrent IO can be achieved without it, as during waiting for an external resource the CPU is not used, so high volume of concurrency can be achieved with a very low number of threads. `ParallelFlux` just make things more complex than needed. – Martin Tarjányi Mar 02 '20 at 07:49

Puce · Answer 2 · 2020-02-29T20:06:26.827

1

The reactive code gets executed when you subscribe to a producer. Block does subscribe and since you call block twice (once on the Mono, but return the Mono again and then call block on the ParallelFlux), the Mono gets executed twice.

    List<String> resultList = listMono.block();
    mapping.put(location, resultList);
    return listMono;

Try something like the following instead (untested):

    listMono.map(resultList -> {
       mapping.put(location, resultList);
       return Mono.just(listMono);
    });

That said, the Reactive Programming model is quite complex, so consider to work with @Async and Future/AsyncResult instead, if this is only about calling the remote call in parallel, as others suggested. You can still use WebClient (RestTemplate seems to be on the way to get deprecated), but just call block right after bodyToMono.

edited Feb 29 '20 at 20:06

answered Feb 29 '20 at 20:00

Puce

37,247
13
80
152

Thanks for your reply. This helpted me to eliminate the duplicated requests. In order prevent an empty map I had to bind the `Mono.just(listMono)` vlaue to a variable that I need to return at the end of the outer flatMap. At first I tried to return `listMono` without creating a new `Mono` with `Mono.just(listMono)`. This also resulted in duplicated requests. Could you explain this behaviour? – froehli Mar 01 '20 at 11:44
Unfortunetly the filled map is not filled complete. The size of the keyset `mapping` is often less than the size of `Set location`. In fact, the size of the resulting map is changing. Do I need an inner subscription or something else to ensure that the map is filled completely before the method returns? I updated my question to reflect the changes that I made during the answers here. – froehli Mar 01 '20 at 11:44

Parallel GET Request to specific mapping with WebFlux

2 Answers2