1

I want to read a large excel-file(.xlsx/.xls). When I upload a 20MB file, suddenly Java-Heap was increased by 2GB and ran into OutOfMemoryError.

private Sheet getSheetForFileType(String filType, InputStream fileData) throws IOException {
    Workbook workbook;
    Sheet sheet;
    if (filType.equalsIgnoreCase("xls")) {
        workbook = new HSSFWorkbook(fileData); //OutOfMemoryError
        sheet = workbook.getSheetAt(0);
    } else {
        workbook = new XSSFWorkbook(fileData); //OutOfMemoryError
        sheet = workbook.getSheetAt(0);
    }
    return sheet;
}

As mentioned here Apache-POI overview , I tried with XSSF and SAX (Event API) Modified code as below:

private Sheet getSheetForFileType(String filType, InputStream fileData) throws IOException {

    if (filType.equalsIgnoreCase("xls")) {
        ....
    } else {
        OPCPackage opcPackage = OPCPackage.open(fileData);  //OutOfMemoryError
        XSSFReader xssfReader = new XSSFReader(opcPackage);
        SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();
        XMLReader parser = getSheetParser(sharedStringsTable);
        ....
        ....
    }
    return sheet;
}

Yet, I'm unable to load-file and read it.

I read file-data from Input-Stream, purpose is ONLY to Read-Data no write operations on it.

Reading a File takes lower memory, while an InputStream requires more memory as it has to buffer the whole file.

I went through other posts, what i understand:

  • increase Heap Memory
  • excel-streaming-reader [can't use, i've to support both(.xlsx/.xls)]
  • read using SAX parser

Update-1: Added a sample-excel picture.

sample excel

Pawan
  • 301
  • 2
  • 9
  • 1
    Have you looked at other related questions and answers on Stack Overflow that suggest reading from a file rather than from an `InputStream` when encountering `OutOfMemoryErrors`? Is this a viable option? Have you tried this? – Jonny Henly Sep 17 '19 at 16:10
  • @JonnyHenly, yes i searched in other posts. 1st code, i followed in memory, DOM approach(problem here is it can't read until whole file is loaded), 2nd code i followed XSSF and SAX approach as mentioned in the official examples/doc. yet no-luck – Pawan Sep 17 '19 at 16:39
  • 1
    If even the `OPCPackage.open(fileData)` leads to out of memory error, then you have no chance. But I really doubt that because it does nothing else than opening the `ZIP` archive of the `Office Open XML` `*.xlsx`. That normally cannot lead to out of memory error. – Axel Richter Sep 17 '19 at 16:56
  • If you are interested just in values, why not to export all the xls/xlsx files to csv through excel? you can create a small macro just to batch it and work with csv (that won´t give you any problem) better than xls/xlsx if you can not handle it. – David García Bodego Sep 18 '19 at 04:11

1 Answers1

0

Try to use very efficient and high performance streaming SXSSFWorkbook class instead of XSSFWorkbook (which keeps the entire Excel workbook in memory) like below:

SXSSFWorkbook workbook = new SXSSFWorkbook(100);

where 100 is the default number of rows that will be kept in memory and processed in real time.

kels
  • 138
  • 2
  • 11