4

I want to go through the list of domain names (millions of records), send a request and receive a response in order to figure out whether it's alive or not.

I have chosen a reactive approach and I expected it to serve a huge amount of hosts with only few threads, but I noticed that my heap memory is constantly growing until OutOfMemory is reached.

Here is my code:

@Slf4j
@Component
@RequiredArgsConstructor
public static class DataLoader implements CommandLineRunner {

    private final ReactiveDomainNameRepository reactiveDomainNameRepository;

    @Override
    @SneakyThrows
    public void run(String... strings) {
        ReactorClientHttpConnector connector = getConnector(); // Trying to reuse connector instead of creating new each time

        reactiveDomainNameRepository.findAllByResourcesIsNull() // Flux<DomainEntity>. This basically streams data from MongoDB using reactive driver
                .publishOn(Schedulers.parallel())
                .flatMap(domain -> performRequest(connector, domain)) // If I remove this line everything starts working just fine
                .buffer(1000) // Little optimization. The problem with memory remains even if I don't use buffering.
                .flatMap(reactiveDomainNameRepository::saveAll)
                .subscribe();
    }

    private Mono<DomainEntity> performRequest(ReactorClientHttpConnector connector, DomainEntity domain) {
        return WebClient
                .builder()
                .clientConnector(connector)
                .baseUrl("http://" + domain.getHost())
                .build()
                .get()
                .exchange()
                .onErrorResume(error -> {
                    log.error("Error while requesting '{}': ", domain.getHost());

                    return Mono.empty();
                }) // Mono<ClientResponse>
                .flatMap(resp -> {

                    if (resp.statusCode() == OK) {
                        log.info("Host '{}' is ok", domain.getHost());
                    } else {
                        log.info("Host '{}' returned '{}' status code", domain.getHost(), resp.statusCode().value());
                    }

                    // Consuming response as described in Spring documentation. Tried also resp.toEntity(String.class) but got the same result
                    return resp.toEntity(Void.class)
                            .map(nothing -> domain);
                });
    }
}

Here is heap memory usage. Don't pay attention on the period 5:59 - 6:05 - that's where the application stopped processing data because I didn't handle a corner case. Usually, it just keeps growing until it reaches the memory limit. enter image description here

So I have basically two questions:

  1. What's wrong with my code?
  2. Is it a good idea to use reactive approach to make a huge amount of requests to different hosts?
kerbermeister
  • 2,985
  • 3
  • 11
  • 30
Danylo Zatorsky
  • 5,856
  • 2
  • 25
  • 49

2 Answers2

3

Just use retrieve() instead of exchange() and you won't even need the messy error handling.

I know it's kind of a late reply, but I was facing the exact same problem a while ago and bumped into your question and just I wanted to leave this possible solution here. :)

And to answer your questions:

  1. When using exchange(), you are responsible for disposing of the connection and error handling, it's not very recommended and should be used only if you really need to take control.

  2. Well, you are using the build in parallelism, so why not.

Roeniss
  • 386
  • 5
  • 16
nooxy
  • 31
  • 5
1

I am new to reactive, and this question is old, but hopefully, it helps someone.

As you have noted in your code sample the line:

flatMap(domain -> performRequest(connector, domain))

is causing the issue. This is because flatMap will create multiple web requests and subscribe to them eagerly, meaning that if you have 100 elements in your flux, 100 web requests will try to occur at once (Spring will do some limiting but it's not too relevant here), and each downstream element in your flow will also occur eagerly, and as each element is being processed, it still has references to it preventing GC from clearing up, hence the memory issues.

https://medium.com/swlh/understanding-reactors-flatmap-operator-a6a7e62d3e95

An alternative is concatMap it behaves similarly to flatMap however it is not eager, meaning that it will wait for the performRequest Mono to complete before making the next request, this will be much slower however, but use less memory.

Whats the difference between flatMap, flatMapSequential and concatMap in Project Reactor?

Finally to improve the performance you can then use the buffer method that you are also using to batch the requests into chunks that won't cause a Memory issue.

reactiveDomainNameRepository.findAllByResourcesIsNull()
        .publishOn(Schedulers.parallel()) // Not sure how this will impact, may need to move to after the Flux.fromIterable
        .buffer(1000) //Adjust value based on your requirements. Buffer into groups before making request
        .concatMap(domains -> { // Will wait for the batch to complete before calling for the next.
            return Flux.fromIterable(domains)
                    .flatMap(domain -> performRequest(connector, domain)) //Make all the requests and save the result in batches
                    .flatMap(reactiveDomainNameRepository::saveAll);
        })
        .subscribe()
kerbermeister
  • 2,985
  • 3
  • 11
  • 30
user2131323
  • 105
  • 10