I am working with Java Apache POI library and dealing with huge excel sheets. approx 10 mb of data with lots of rows and columns. There are also 8-10 different sheets in one excel file. The data is not in rich text format, but full of internal functions and formulas eg. = SUM(A2:A4)
and so on which I don't have any concerns with.
This image is just for illustration purpose. functions in actual data are way different and very complex:
The data includes Strings, Numbers and Boolean values. My concern is only make XSSF read values as normal text excluding all the formulas or functions that are applied in excel. So to say, in above image I only want to read values in rows and columns i.e. 10,20,30 etc, Numbers, Total
Problem
If I format excel sheets and remove all formulas and functions and save data in simple rich text format, my code runs. However, when I don't modify excel files and keep data as shown in above format I run into GC overhead limit exceeded error.
What I want
I just want to read excel files full of formulas and functions just as they are. My algorithm works when I remove all the formulas and keep text in sheets as normal rich text format.
What I tried
As mentioned in other resources online and on stackoverflow, I tried 1st approach as given in below code:
fis = new FileInputStream(path);
opc = OPCPackage.open(fis);
XSSFWorkbook workbook = new XSSFWorkbook(opc);
Rather than using simply FileInputStream
for input I first passed it through OPCPackage. Still it shows same error and code wont execute below XSSFWorkbook workbook
I then used 2nd approach with XSSFReader
. Below is the code:
xssfReader = new XSSFReader(opc);
SharedStringsTable sst = xssfReader.getSharedStringsTable();
XSSFReader.SheetIterator itr = (XSSFReader.SheetIterator)xssfReader.getSheetsData();
while(itr.hasNext()) {
InputStream sheetStream = itr.next();
if(itr.getSheetName().equals(sheetName)) {
// no idea how to extract sheet like I would do in XSSFWorkbook
// I only get Sheet name of desired sheet
} // while ends here
Nothing so far works for me and if I use XSSFWorkbook
, it will throw GC overhead limit exceeded error. So currently I am manually removing all formulas and functions and then algorithm works but its not efficient way to deal with the problem. Any help or suggestions are appreciated.
EDIT:
As pointed in link here I tried allocating more memory, but its still not working out. Below are some snapshots of me trying to allocate more memory.
If I am doing something wrong in allocating memory, let me know. I will do the needed change.
New Edit
I have solved my problem as mentioned in centic comment below by adding -Xmx8192m
to my run configurations in eclipse. I am now looking into other ways of solving memory issue by using SXSSFWorkbook
as already discussed in answer below.