2

I am working with Java Apache POI library and dealing with huge excel sheets. approx 10 mb of data with lots of rows and columns. There are also 8-10 different sheets in one excel file. The data is not in rich text format, but full of internal functions and formulas eg. = SUM(A2:A4) and so on which I don't have any concerns with.

This image is just for illustration purpose. functions in actual data are way different and very complex:

enter image description here

The data includes Strings, Numbers and Boolean values. My concern is only make XSSF read values as normal text excluding all the formulas or functions that are applied in excel. So to say, in above image I only want to read values in rows and columns i.e. 10,20,30 etc, Numbers, Total

Problem

If I format excel sheets and remove all formulas and functions and save data in simple rich text format, my code runs. However, when I don't modify excel files and keep data as shown in above format I run into GC overhead limit exceeded error.

What I want

I just want to read excel files full of formulas and functions just as they are. My algorithm works when I remove all the formulas and keep text in sheets as normal rich text format.

What I tried

As mentioned in other resources online and on stackoverflow, I tried 1st approach as given in below code:

fis = new FileInputStream(path);
opc = OPCPackage.open(fis);  
XSSFWorkbook workbook = new XSSFWorkbook(opc);

Rather than using simply FileInputStream for input I first passed it through OPCPackage. Still it shows same error and code wont execute below XSSFWorkbook workbook

I then used 2nd approach with XSSFReader. Below is the code:

    xssfReader = new XSSFReader(opc);
    SharedStringsTable sst = xssfReader.getSharedStringsTable();
    XSSFReader.SheetIterator itr = (XSSFReader.SheetIterator)xssfReader.getSheetsData();                

    while(itr.hasNext()) {
            InputStream sheetStream = itr.next();
            if(itr.getSheetName().equals(sheetName)) {

              // no idea how to extract sheet like I would do in XSSFWorkbook
              // I only get Sheet name of desired sheet

    } // while ends here

Nothing so far works for me and if I use XSSFWorkbook, it will throw GC overhead limit exceeded error. So currently I am manually removing all formulas and functions and then algorithm works but its not efficient way to deal with the problem. Any help or suggestions are appreciated.

EDIT:

As pointed in link here I tried allocating more memory, but its still not working out. Below are some snapshots of me trying to allocate more memory.

enter image description here enter image description here

If I am doing something wrong in allocating memory, let me know. I will do the needed change.

New Edit

I have solved my problem as mentioned in centic comment below by adding -Xmx8192m to my run configurations in eclipse. I am now looking into other ways of solving memory issue by using SXSSFWorkbook as already discussed in answer below.

Community
  • 1
  • 1
Radheya
  • 779
  • 1
  • 11
  • 41
  • Possible duplicate of [GC overhead limit exceeded with Apache POI](http://stackoverflow.com/questions/33368612/gc-overhead-limit-exceeded-with-apache-poi) – huellif Apr 20 '17 at 14:41
  • @huellif I tried allocating more memory, but its not working. I also edited my question. – Radheya Apr 20 '17 at 15:05
  • 2
    The memory settings you show are for Eclipse IDE and Java Webstart, how are you actually starting your application? If as application or unit test inside Eclipse, then you need to adjust memory settings in the run configuration instead to actually apply them when your own code is running. – centic Apr 21 '17 at 06:16
  • I just arrived on my workplace. I am starting my application directly from eclipse as of now. I will try adjusting settings in run configuration now. Lets see if it works – Radheya Apr 21 '17 at 07:25
  • @centic I solved my problem with your trick. Just added `-Xmx8192m` in run configurations. It solved my problem but I hope when I finish my project, it will run successfully on my colleagues computer. Is there a way I can accept your comment as a final answer? – Radheya Apr 21 '17 at 08:02
  • Added an answer... – centic Apr 22 '17 at 12:32

2 Answers2

2

Have you tried opening the file as SXSSF workbook instead of a XSSF workbook?

fis = new FileInputStream(path);
opc = OPCPackage.open(fis); 
XSSFWorkbook workbook = new XSSFWorkbook(opc);
SXSSFWorkbook wb = new SXSSFWorkbook(workbook);

See https://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFWorkbook.html. Taken directy from their JavaDoc: "This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time"

tomgeraghty3
  • 1,234
  • 6
  • 10
  • Yes I tried exactly this but my code throws GC limit exceeded exception right after `XSSFWorkbook workbook = new XSSFWorkbook(opc);` code. I just don't get why it is like that. I have no problem when I remove formulas from the excel file. I have huge amount of data. – Radheya Apr 20 '17 at 15:10
  • Hmm. Does this work? FileInputStream fis = new FileInputStream(excelFile); Workbook workBook = new SXSSFWorkbook(200); workBook = WorkbookFactory.create(fis); – tomgeraghty3 Apr 20 '17 at 15:13
  • I tried this now and I get same error after line `workBook = WorkbookFactory.create(fis);` – Radheya Apr 21 '17 at 07:44
1

Post comment as answer:

The memory settings you show are for Eclipse IDE and Java Webstart, how are you actually starting your application? If as application or unit test inside Eclipse, then you need to adjust memory settings in the run configuration instead to actually apply them when your own code is running.

centic
  • 15,565
  • 9
  • 68
  • 125