0

As per business functionality we need to read multiple excel files(both .xls and .xlsx format) at different locations in a multi thread environment. Each thread is responsible for reading a file. In order to test the performance, we have created 2 file sets in both .xls and .xlsx formats. One file set has just 20 row data while other file set contains 300,000 row data. We are able successfully read both files in .xls formats and load data into the table. Even for 20 row data .xlsx file, our source code is working fine. But when the execution flow starts reading .xlsx file, application server is terminated abruptly. When I started tracing down the issue, I have been facing a strange issue while creating XSSFWorkbook instance.Refer the code snippet below:

OPCPackage opcPackage = OPCPackage.open(FILE);
System.out.println("Created OPCPackage instance.");
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);
System.out.println("Created XSSFWorkbook instance.");
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(workbook, 1000);
System.out.println("Created SXSSFWorkbook instance.");[/code]

Output

Process XLSX file EXCEL_300K.xlsx start.
Process XLSX file EXCEL.xlsx start.
Created OPCPackage instance.
Created OPCPackage instance.
Created XSSFWorkbook instance.
Created SXSSFWorkbook instance.
Process XLSX file EXCEL.xlsx end.

For larger file set the execution hangs at XSSFWorkbook workbook = new XSSFWorkbook(opcPackage); causing heap space issue. Please do help me to fix this issue.

Thanks in advance.

Thanks,

Sankar.

  • This can either be an error in the library which leads to an endless loop (which will end in an OOME no matter what you do) or perhaps memory is simply to terse. Did you try to increase memory or did you monitor the growth of consumed memory ? – Marged Jul 02 '15 at 13:30
  • 1. I am using Apache POI -3.10 library for the above implementation. – Sankara Sarma Jul 02 '15 at 14:03
  • Great information. And what about the questions regarding memory ? How about updating to 3.12, there were a lof of fixes between 3.10 and 3.12 http://poi.apache.org/changes.html ? – Marged Jul 02 '15 at 14:04
  • 2. I have allocated maximum limit that JVM can use as 1.5 GB . I guess memory is not an issue because I am able to read 300,000 rows in .xls format. 3. We have a memory trace mechanism in place and observed that memory consumed by the process increased by time. By any chance is there any issue with the Apache POI library version that I am using? – Sankara Sarma Jul 02 '15 at 14:12
  • There were many issues, look at the changes made. If your problems still exists after updating to 3.12 you should file a bug. – Marged Jul 02 '15 at 16:42
  • @Marged Yes, that would be my next course of action on this problem. – Sankara Sarma Jul 02 '15 at 17:11
  • 1
    Did you [try following the "too much memory" instructions in the Apache POI FAQ](http://poi.apache.org/faq.html#faq-N10109)? – Gagravarr Jul 02 '15 at 17:27

1 Answers1

0

After trying lot of solutions I found out that processing XLSX files require huge memory. But using POI 3.12 library has multiple advantages.

  1. Processes excel files faster.
  2. Has more API's to handle excel files like closing a working book, opening a excel file using File instance etc.