I want to go through the list of domain names (millions of records), send a request and receive a response in order to figure out whether it's alive or not.
I have chosen a reactive approach and I expected it to serve a huge amount of hosts with only few threads, but I noticed that my heap memory is constantly growing until OutOfMemory is reached.
Here is my code:
@Slf4j
@Component
@RequiredArgsConstructor
public static class DataLoader implements CommandLineRunner {
private final ReactiveDomainNameRepository reactiveDomainNameRepository;
@Override
@SneakyThrows
public void run(String... strings) {
ReactorClientHttpConnector connector = getConnector(); // Trying to reuse connector instead of creating new each time
reactiveDomainNameRepository.findAllByResourcesIsNull() // Flux<DomainEntity>. This basically streams data from MongoDB using reactive driver
.publishOn(Schedulers.parallel())
.flatMap(domain -> performRequest(connector, domain)) // If I remove this line everything starts working just fine
.buffer(1000) // Little optimization. The problem with memory remains even if I don't use buffering.
.flatMap(reactiveDomainNameRepository::saveAll)
.subscribe();
}
private Mono<DomainEntity> performRequest(ReactorClientHttpConnector connector, DomainEntity domain) {
return WebClient
.builder()
.clientConnector(connector)
.baseUrl("http://" + domain.getHost())
.build()
.get()
.exchange()
.onErrorResume(error -> {
log.error("Error while requesting '{}': ", domain.getHost());
return Mono.empty();
}) // Mono<ClientResponse>
.flatMap(resp -> {
if (resp.statusCode() == OK) {
log.info("Host '{}' is ok", domain.getHost());
} else {
log.info("Host '{}' returned '{}' status code", domain.getHost(), resp.statusCode().value());
}
// Consuming response as described in Spring documentation. Tried also resp.toEntity(String.class) but got the same result
return resp.toEntity(Void.class)
.map(nothing -> domain);
});
}
}
Here is heap memory usage. Don't pay attention on the period 5:59 - 6:05 - that's where the application stopped processing data because I didn't handle a corner case. Usually, it just keeps growing until it reaches the memory limit.
So I have basically two questions:
- What's wrong with my code?
- Is it a good idea to use reactive approach to make a huge amount of requests to different hosts?