1

I am trying to parse a large excel file(.xlsx) using Apache POI XSSF library. After 100,000 rows it throws heap space error. I tried increasing the memory but it does not help. Is there a workaround for this problem? Or can someone suggest me a another library to parse large excel files.

Thanks!

user3777066
  • 11
  • 1
  • 2
  • Did you try reading the [Apache POI FAQ entry "I think POI is using too much memory! What can I do?"](http://poi.apache.org/faq.html#faq-N10109) – Gagravarr Sep 26 '14 at 22:30

4 Answers4

1

You can use http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api

Have a look at this thread for details.

Efficient way to search records from an excel file using Apache-POI

Community
  • 1
  • 1
Renjith
  • 1,122
  • 1
  • 19
  • 45
0

When facing the most common OutOfMemoryError, namely the one "java.lang.OutOfMemoryError: Java heap space", some simple aspects must first be understood.

Java applications are allowed to use a limited amount of memory. This limit is specified during application startup. To make things more complex, Java memory is separated different regions named heap space and permgen.

The size of those regions is set during the Java Virtual Machine (JVM) launch by specifying parameters such as -Xmx and -XX:MaxPermSize. If you do not explicitly set the sizes, platform-specific defaults will be used.

So – the “[java.lang.OutOfMemoryError: Java heap space][1]” error will be triggered when you try to add more data into the heap space area, but there is not enough room for it.

Based on this simple description, you have two options

  • Give more room to the data structures
  • Reduce the size of the data structures used

Giving more room is easy - just increase the heap size by changing the -Xmx parameter, similar to the following example giving your Java process 1G of heap to play with:

java -Xmx1024m com.mycompany.MyClass

Reducing the size of the data structures typically takes more effort, but this might be necessary in order to get rid of the underlying problems - giving more room can sometimes just mask the symptoms and postpone the inevitable. For example, when facing a memory leak you are just postponing the time when all the memory is filled with leaking garbage.

In your case, reading the data in smaller batches and processing each batch at the time might be an option.

Flexo
  • 87,323
  • 22
  • 191
  • 272
Ivo
  • 444
  • 3
  • 7
0

I'm aware that the question is 9 years old, but I had today the same problem trying to write big or a huge Excel workbooks with Apache POI 3.8; it turned out that SXSSFWorkbook was the solution: at first I used HSSFWorkbook and it worked fine for workbooks of 40k rows but did not work for 100k rows, so them I switched to XSSFWorkbook which worked fine for some workbooks of 200k rows but did not work for 460k rows, so I switched to SXSSF and have the job done.

So, my code looks like this (notice the integer parameter in new SXSSFWorkbook(1024) which is the number of rows to keep in memory and the workbook.dispose() at the end, which is different from the Workbook interface)

    SXSSFWorkbook  workbook = new SXSSFWorkbook(1024);
    CreationHelper createHelper = workbook.getCreationHelper();
    ...

    try {

        FileOutputStream fileOut = new FileOutputStream(...);

        workbook.write(fileOut);
        fileOut.close();

        workbook.dispose();   // new ! (Dispose of temporary files backing this workbook on disk.)

     } catch (IOException e) {

        throw new ExceptieSistem(EroareEsantion.EROARE_EXPORT_XLS_IO, e);
     }
Mihai Cazac
  • 158
  • 2
  • 7
-1

Try the latest (stable!) Version from Apache POI.

Alternatives might be smartXLS

Benvorth
  • 7,416
  • 8
  • 49
  • 70