I need to process data from Rest web service. the following basic exemple is :
import org.springframework.batch.item.ItemReader;
import org.springframework.http.ResponseEntity;
import org.springframework.web.client.RestTemplate;
import java.util.Arrays;
import java.util.List;
class RESTDataReader implements ItemReader<DataDTO> {
private final String apiUrl;
private final RestTemplate restTemplate;
private int nextDataIndex;
private List<DataDTO> data;
RESTDataReader(String apiUrl, RestTemplate restTemplate) {
this.apiUrl = apiUrl;
this.restTemplate = restTemplate;
nextDataIndex = 0;
}
@Override
public DataDTO read() throws Exception {
if (dataIsNotInitialized()) {
data = fetchDataFromAPI();
}
DataDTO nextData = null;
if (nextDataIndex < data.size()) {
nextData = data.get(nextDataIndex);
nextDataIndex++;
}
else {
nextDataIndex= 0;
data = null;
}
return nextData;
}
private boolean dataIsNotInitialized() {
return this.data == null;
}
private List<DataDTO> fetchDataFromAPI() {
ResponseEntity<DataDTO[]> response = restTemplate.getForEntity(apiUrl,
DataDTO[].class
);
DataDTO[] data= response.getBody();
return Arrays.asList(data);
}
}
However, my fetchDataFromAPI method is called with time slots and it could get more than 20 Millions objects.
For example : if i call it between 01012020 and 01012021 i'll get 80 Millions data.
PS : the web service works by pagination of a single day, i.e. if I want to retrieve the data between 01/09/2020 and 07/09/2020 I have to call it several times (between 01/09-02/09 then between 02/09-03/09 and so on until 06/09-07/09)
My problem in this case is a heap space memory if the data is bulky.
I had to create a step for each month to avoid this problem in my BatchConfiguration (12 steps). The first step which will call the web service between 01/01/2020 and 01/02/2020 etc
Is there a solution to read all this volume of data with only one step before going to the processor ??
Thanks in advance