5

I need to get the items from all pages of a pageable REST API. I also need to start processing items, as soon as they are available, not needing to wait for all the pages to be loaded. In order to do so, I'm using Spring WebFlux and its WebClient, and want to return Flux<Item>. Also, the REST API I'm using is rate limited, and each response to it contains headers with details on the current limits:

  • Size of the current window
  • Remaining time in the current window
  • Request quota in window
  • Requests left in current window

The response to a single page request looks like:

{
    "data": [],
    "meta": {
      "pagination": {
        "total": 10,
        "current": 1
      }
    }
}

The data array contains the actual items, while the meta object contains pagination info.

My current solution first does a "dummy" request, just to get the total number of pages, and the rate limits.

Mono<T> paginated = client.get()
    .uri(uri)
    .exchange()
    .flatMap(response -> {                  
        HttpHeaders headers = response.headers().asHttpHeaders();

        Limits limits = new Limits();
        limits.setWindowSize(headers.getFirst("X-Window-Size"));
        limits.setWindowRemaining(headers.getFirst("X-Window-Remaining"));
        limits.setRequestsQuota(headers.getFirst("X-Requests-Quota");
        limits.setRequestsLeft(headers.getFirst("X-Requests-Remaining");

        return response.bodyToMono(Paginated.class)
                .map(paginated -> { 
                    paginated.setLimits(limits);
                    return paginated;
                });
    });

Afterwards, I emit a Flux containing page numbers, and for each page, I do a REST API request, each request being delayed enough so it doesn't get past the limit, and return a Flux of extracted items:

return paginated.flatMapMany(paginated -> {
    return Flux.range(1, paginated.getMeta().getPagination().getTotal())
            .delayElements(Duration.ofMillis(paginated.getLimits().getWindowRemaining() / paginated.getLimits().getRequestsQuota()))
            .flatMap(page -> {
                return client.get()
                        .uri(pageUri)
                        .retrieve()
                        .bodyToMono(Item.class)
                        .flatMapMany(p -> Flux.fromIterable(p.getData()));
            });
});

This does work, but I'm not happy with it because:

  • It does initial "dummy" request to get the number of pages, and then repeats the same request to get the actual data.
  • It gets rate limits only with the initial request, and assumes the limits won't change (eg, that it's the only one using the API) - which may not be true, in which case it will get an error that it exceeded the limit.

So my question is how to refactor it so it doesn't need the initial request (but rather get limits, page numbers and data from the first request, and continue through all pages, while updating (and respecting) the limits.

  • You can check out solutions from this qns, this might help you https://stackoverflow.com/questions/53274568/how-to-collect-paginated-api-responses-using-spring-boot-webclient – Raghu Molabanti Nov 27 '18 at 11:05

1 Answers1

4

I think this code will do what you want. The idea is to make a flux that make a call to your resource server, but in the process to handle the response, to add a new event on that flux to be able to make the call to next page.

The code is composed of:

A simple wrapper to contains the next page to call and the delay to wait before executing the call

private class WaitAndNext{
    private String next;
    private long delay;
}

A FluxProcessor that will make HTTP call and process the response:

FluxProcessor<WaitAndNext, WaitAndNext> processor= DirectProcessor.<WaitAndNext>create();
FluxSink<WaitAndNext> sink=processor.sink();

processor
    .flatMap(x-> Mono.just(x).delayElement(Duration.ofMillis(x.delay)))
    .map(x-> WebClient.builder()
    .baseUrl(x.next)
    .defaultHeader("Accept","application/json")
    .build())
    .flatMap(x->x.get()        
                 .exchange()
                 .flatMapMany(z->manageResponse(sink, z))
            )
    .subscribe(........);

I split the code with a method that only manage response: It simply unwrap your data AND add a new event to the sink (the event beeing the next page to call after the given delay)

private Flux<Data> manageResponse(FluxSink<WaitAndNext> sink, ClientResponse resp) {

    if (resp.statusCode()!= HttpStatus.OK){
        sink.error(new IllegalStateException("Status code invalid"));
    }

    WaitAndNext wn=new WaitAndNext();
    HttpHeaders headers=resp.headers().asHttpHeaders();
    wn.delay= Integer.parseInt(headers.getFirst("X-Window-Remaining"))/ Integer.parseInt(headers.getFirst("X-Requests-Quota"));

    return resp.bodyToMono(Item.class)
        .flatMapMany(p -> {
            if (p.paginated.current==p.paginated.total){
                sink.complete();
            }else{
                wn.next="https://....?page="+(p.paginated.current+1);
                sink.next(wn);
            }
            return Flux.fromIterable(p.getData());
        });
}

Now we just need to initialize the system by calling for the retrieval of the first page with no delay:

WaitAndNext wn=new WaitAndNext();
wn.next="https://....?page=1";
wn.delay=0;
sink.next(wn);
wargre
  • 4,575
  • 1
  • 19
  • 35