Apache POI SXSSFWorkbook write around 500.000 in XLSX file Memory issue

Question

We developed an application using Spring Batch and Apache POI to read data from DB(Oracle) and write it into a XLSX generated file from apache-poi. There is a job with reader (read data from DB ItemReader) and writer (ItemWriter) below the example:

@BeforeStep
public void beforeStep(StepExecution stepExecution) {
    log.info("Calling analytic beforeStep");

    cellProperties = new HashMap<String, Object>();
    cellProperties.put(CellUtil.BORDER_LEFT, BorderStyle.THIN);
    cellProperties.put(CellUtil.BORDER_RIGHT, BorderStyle.THIN);
    cellProperties.put(CellUtil.BORDER_BOTTOM, BorderStyle.THIN);
    cellProperties.put(CellUtil.BORDER_TOP, BorderStyle.THIN);

    outputFilename = stepExecution.getJobExecution().getExecutionContext().getString("fileName");
    File xlsxFile = new File(outputFilename);

    if (xlsxFile.exists() && !xlsxFile.isDirectory()) {
        log.info("ViewReportSalesAnalyticsExcelWriter File exist");
        InputStream fileIn = null;
        try {
            fileIn = new BufferedInputStream(new FileInputStream(xlsxFile), 100);
            workbook = new SXSSFWorkbook(new XSSFWorkbook(fileIn), 100);
        } catch (Exception e) {
            log.error("ViewReportSalesAnalyticsExcelWriter error: ", e);
        } finally {
            if (fileIn != null) {
                try {
                    fileIn.close();
                } catch (IOException e) {
                    log.error("ViewReportSalesAnalyticsExcelWriter error: ", e);
                }
            }
        }
    } else {
        log.info("ViewReportSalesAnalyticsExcelWriter File not exist, creating");
        workbook = new SXSSFWorkbook(100);
    }

    // temp files will be gzipped
    workbook.setCompressTempFiles(true);
    Sheet sheet = workbook.createSheet("Report POG - Analitico");
    sheet.setFitToPage(true);
    sheet.setDisplayGridlines(false);
    sheet.setDefaultColumnWidth(23);
    sheet.setDefaultRowHeight((short) 290);

    addTitleToSheet(sheet);
    //currRow++;
    addHeaders(sheet);
    //initDataStyle();
}

@Override
public void write(List<? extends ViewReportSalesAnalyticsDTO> items) throws Exception {
    log.info("Analytic write, number of elements - {} ", items.size());

    Sheet sheet = workbook.getSheetAt(1);
    SimpleDateFormat spf = new SimpleDateFormat(PogEngineBatchConstants.DATE_FORMAT_REPORT);
    Row row = null;

    for (ViewReportSalesAnalyticsDTO data : items) {
        int i = 0;
        currRow++;

        row = sheet.createRow(currRow);
        createStringCell(row, data.getProductCode(), i++);
        createStringCell(row, data.getProductDesc(), i++);
        createStringCell(row, PogEngineBatchUtils.decodeFlagFactory(data.getFlagManufacturer()), i++);

        for (String headerKey : dynamicHeaders) {
            createStringCell(row, data.getSellingJson().get(headerKey) == null ? "OK" : //TODO always OK?
                    data.getSellingJson().get(headerKey), i++);
        }
        createStringCell(row, data.getMonitoringOutcome(), i++);

        for (String headerKey : dynamicHeaders) {
            createStringCell(row, data.getMonitorJson().get(headerKey) == null ? "OK" : //TODO always OK?
                    data.getMonitorJson().get(headerKey), i++);
        }
    }
}

@AfterStep
public void afterStep(StepExecution stepExecution) throws IOException {
    FileOutputStream fos = new FileOutputStream(outputFilename);
    log.info("ViewReportSalesAnalyticsExcelWriter afterStep, write records to file");
    workbook.write(fos);
    fos.close();
    workbook.dispose();
    workbook.close();
}

In the afterStep during workbook.write(fos); we are writing about 500.000 records and we are getting GC overhead memory. In order to avoid this issue we put this VM argument -Xmx8192m inside the batch launcher(sh) and it's taking a lot of time and a lot of memory is used from the VM (16 GB and 8 CORE) to end the process and write the file (about 110 MB).

There is something we can do in our code to not overlap the memory (because data will grow in future executions)? There is any solution for change the behavoir of workbook.write() in order to not write 500.000 records in one shot? Maybe we can split during the process?

why do you need this? `workbook = new SXSSFWorkbook(new XSSFWorkbook(fileIn), 100);` - the XSSFWorkbook can use a lot of memory — PJ Fanning, Feb 05 '21 at 02:14
I Need It because first Tab of excel file is written from previous step this Is the step for second Tab. Do you think we can manage It in a differenti way? Share workbook variables in differents steps of spring batch — Riccardo Barbero, Feb 05 '21 at 22:54
You are losing the benefit of streaming by reading the interim file into an XSSFWorkbook - I would recommend creating 1 SXSSFWorkbook and having the 2 steps write out using the same SXSSFWorkbook. — PJ Fanning, Feb 06 '21 at 00:43
really helpfull, i solve sharing workbook variable between differents steps using @Bean. The batch seems faster than previous version. — Riccardo Barbero, Feb 10 '21 at 10:53

score 1 · Answer 1 · answered Feb 05 '21 at 08:18

1

The code is very similar to https://keyholesoftware.com/2012/11/12/generating-large-excel-files-using-spring-batch-part-three/. This approach does not follow the chunk-oriented processing model proposed by Spring Batch. The problem is that the Writer.write is only creating cells in memory and the workbook is written to disk only in the AfterStep callback. Obviously with this approach all items are kept in memory until the entire step is complete, and only then they are written to disk.

Items should be written to disk after each chunk and not after the entire step.

answered Feb 05 '21 at 08:18

Mahmoud Ben Hassine

28,519
3
32
50

How can we write to file After each chunk ? Do you have an exampe or usefull link – Riccardo Barbero Feb 05 '21 at 22:52
I'm not familiar with Apache POI APIs, but you need to flush the buffer and write items to disk after each chunk, so that items are garbage collected and the process continues with the next chunk. – Mahmoud Ben Hassine Feb 08 '21 at 08:55

Apache POI SXSSFWorkbook write around 500.000 in XLSX file Memory issue

1 Answers1