We developed an application using Spring Batch and Apache POI to read data from DB(Oracle) and write it into a XLSX generated file from apache-poi. There is a job with reader (read data from DB ItemReader) and writer (ItemWriter) below the example:
@BeforeStep
public void beforeStep(StepExecution stepExecution) {
log.info("Calling analytic beforeStep");
cellProperties = new HashMap<String, Object>();
cellProperties.put(CellUtil.BORDER_LEFT, BorderStyle.THIN);
cellProperties.put(CellUtil.BORDER_RIGHT, BorderStyle.THIN);
cellProperties.put(CellUtil.BORDER_BOTTOM, BorderStyle.THIN);
cellProperties.put(CellUtil.BORDER_TOP, BorderStyle.THIN);
outputFilename = stepExecution.getJobExecution().getExecutionContext().getString("fileName");
File xlsxFile = new File(outputFilename);
if (xlsxFile.exists() && !xlsxFile.isDirectory()) {
log.info("ViewReportSalesAnalyticsExcelWriter File exist");
InputStream fileIn = null;
try {
fileIn = new BufferedInputStream(new FileInputStream(xlsxFile), 100);
workbook = new SXSSFWorkbook(new XSSFWorkbook(fileIn), 100);
} catch (Exception e) {
log.error("ViewReportSalesAnalyticsExcelWriter error: ", e);
} finally {
if (fileIn != null) {
try {
fileIn.close();
} catch (IOException e) {
log.error("ViewReportSalesAnalyticsExcelWriter error: ", e);
}
}
}
} else {
log.info("ViewReportSalesAnalyticsExcelWriter File not exist, creating");
workbook = new SXSSFWorkbook(100);
}
// temp files will be gzipped
workbook.setCompressTempFiles(true);
Sheet sheet = workbook.createSheet("Report POG - Analitico");
sheet.setFitToPage(true);
sheet.setDisplayGridlines(false);
sheet.setDefaultColumnWidth(23);
sheet.setDefaultRowHeight((short) 290);
addTitleToSheet(sheet);
//currRow++;
addHeaders(sheet);
//initDataStyle();
}
@Override
public void write(List<? extends ViewReportSalesAnalyticsDTO> items) throws Exception {
log.info("Analytic write, number of elements - {} ", items.size());
Sheet sheet = workbook.getSheetAt(1);
SimpleDateFormat spf = new SimpleDateFormat(PogEngineBatchConstants.DATE_FORMAT_REPORT);
Row row = null;
for (ViewReportSalesAnalyticsDTO data : items) {
int i = 0;
currRow++;
row = sheet.createRow(currRow);
createStringCell(row, data.getProductCode(), i++);
createStringCell(row, data.getProductDesc(), i++);
createStringCell(row, PogEngineBatchUtils.decodeFlagFactory(data.getFlagManufacturer()), i++);
for (String headerKey : dynamicHeaders) {
createStringCell(row, data.getSellingJson().get(headerKey) == null ? "OK" : //TODO always OK?
data.getSellingJson().get(headerKey), i++);
}
createStringCell(row, data.getMonitoringOutcome(), i++);
for (String headerKey : dynamicHeaders) {
createStringCell(row, data.getMonitorJson().get(headerKey) == null ? "OK" : //TODO always OK?
data.getMonitorJson().get(headerKey), i++);
}
}
}
@AfterStep
public void afterStep(StepExecution stepExecution) throws IOException {
FileOutputStream fos = new FileOutputStream(outputFilename);
log.info("ViewReportSalesAnalyticsExcelWriter afterStep, write records to file");
workbook.write(fos);
fos.close();
workbook.dispose();
workbook.close();
}
In the afterStep during workbook.write(fos); we are writing about 500.000 records and we are getting GC overhead memory. In order to avoid this issue we put this VM argument -Xmx8192m inside the batch launcher(sh) and it's taking a lot of time and a lot of memory is used from the VM (16 GB and 8 CORE) to end the process and write the file (about 110 MB).
There is something we can do in our code to not overlap the memory (because data will grow in future executions)? There is any solution for change the behavoir of workbook.write() in order to not write 500.000 records in one shot? Maybe we can split during the process?