3

I am trying to call a paginated API eg. Search API from AbstractPaginatedDataItemReader. I want to keep calling this API till it doesn't have any more data for a page, I am trying to continue the chunk after every page and it seems the batch doesn't get past page 1, here is the code and configuration I am using

Launch context as below

<batch:job id="fileupload">
    <batch:step id="readApi">
        <batch:tasklet>
            <batch:chunk reader="readPaginatedApi" processor="processApiResults"
                         writer="emailItemWriter" commit-interval="10"/>
        </batch:tasklet>
        <batch:next on="NEXT_PAGE" to="readPaginatedApi"/>
        <batch:end on="END" />
    </batch:step>
</batch:job>

And here is the reader snippet

@Component("readPaginatedApi")
@Scope("step")
public class ReadPaginatedApi extends AbstractPaginatedDataItemReader<SearchResponse> {

@BeforeStep
public void beforeStep(StepExecution stepExecution) {
    this.setName("READER");
    this.setExecutionContextName("READER");
    
    String pageSizeString = stepExecution.getJobParameters().getString("page_size");
    if (StringUtils.isNotBlank(pageSizeString) && NumberUtils.isParsable(pageSizeString)) {
        try {
            pageSize = Integer.parseInt(pageSizeString);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    String pageString = stepExecution.getJobParameters().getString("page");
    if (StringUtils.isNotBlank(pageString) && NumberUtils.isParsable(pageString)) {
        try {
            page = Integer.parseInt(pageString);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

@Override
protected Iterator<Payee> doPageRead() {
    //Call API 
    //Return iterator of results or empty iterator
}

@AfterStep
public ExitStatus afterStep(StepExecution stepExecution) {
    AtomicInteger pageAtomicInteger = new AtomicInteger(page);
    SearchResponse searchResponse = //call service, get response
    if (searchResponse != null && CollectionUtils.isNotEmpty(searchResponse.getItems())) {
        pageAtomicInteger.set(page + 1);
        return new ExitStatus("NEXT_PAGE", String.format("page %d", page));
    }
    return new ExitStatus("END", String.format("page %d", page));
}

}

What am I missing here? How can I make this work? Is this the right approach for this case?Appreciate any help on this

g0c00l.g33k
  • 2,458
  • 2
  • 31
  • 41

2 Answers2

4

batch:next, batch:end, etc are used to define the execution flow of the steps of your job. Those are not intended to iterate over all pages of a paging item reader, they are used at a higher level.

What you need to do is extend AbstractPaginatedDataItemReader and implement doPageRead. Your implementation should maintain the state of which page is currently being read, the list of items, etc.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • The above sample does use `AbstractPaginatedDataItemReader` already, I was hoping that the state maintenance in `afterStep` should handle the page number, but doesn't seem to be the case - tbh, the state maintenance is where I need some help with – g0c00l.g33k Jul 08 '20 at 04:20
  • 1
    The state should be kept in your reader. The afterStep will be called after the entire step, aka after all pages have been read by your reader. You need to have an instance variable in your reader of the current page and how many items have been read from that page. Once a page is fully read, you need to do another call to grab the next page and so on. – Mahmoud Ben Hassine Jul 08 '20 at 07:29
0

Looking at the equivalent java config and the signature of on and to method, to accepts a Flow, Step or JobExecutionDecider . So I think you need to replace

<batch:next on="NEXT_PAGE" to="readPaginatedApi"/>
    with 
<batch:next on="NEXT_PAGE" to="readApi"/>
  • Hi thanks for the reply, I updated that, still seems to be stuck on page 1 – g0c00l.g33k Jul 01 '20 at 08:31
  • I am having similar requirement where I need to invoke rest API which gives 200 records at a time from a legacy system. Considering there will be millions of records to process and at a time only 200 being received in API there will be lots of calls. I would like to understand spring batch is a way to go or we should have simple springboot service which keeps calling API until it stops giving result. In such scenario spring batch adds any additional benefit? As per my past experience I used spring batch only when I have to process millions of records at a time – user2425109 May 17 '23 at 12:57