I am using POI in my J2EE web application to generate a workbook. However, i find that POI takes around 3 mins to create a workbook with 25K rows(with around 15 columns each). Is this a POI performance issue , or is it justified to take that much of time? Are there other APIs known for better performance ?
-
@Gugusse, accepted answers are an indicator to those reading the question what the _asker_ considered the right answer to his/her actual problem. That is a very important piece of information. That said, any reason you did not comment the need for the asker to provide a minimal code snippet showing the behaviour described? – Thorbjørn Ravn Andersen Feb 14 '11 at 22:33
-
@Gugussee, he is just guessing. The actual reason could be anything, which the code sample would show immediately. Would it be reasonable to suggest you improve your meta-policing? – Thorbjørn Ravn Andersen Feb 15 '11 at 16:52
-
@Thorbjørn Ravn Andersen: Would it be reasonable to suggest you improve your meta-meta-policing? ;) – Gugussee Feb 15 '11 at 17:45
-
@Gugussee, I'll consider that when you have caught up with me... – Thorbjørn Ravn Andersen Feb 15 '11 at 18:01
6 Answers
The performance of writing large files with POI can be heavily reduced if you used the 'streaming' POI API instead of the standard one. Indeed by default POI will keep all your data in memory before writing all in one go at the end. The memory footprint of this can be ridiculously large for big files. Instead using the streaming API you can control how memory is used and data is written to disk progressively.
In order to create a streaming workbook, use something like :
SXSSFWorkbook book = new SXSSFWorkbook();
book.setCompressTempFiles(true);
SXSSFSheet sheet = (SXSSFSheet) book.createSheet();
sheet.setRandomAccessWindowSize(100);// keep 100 rows in memory, exceeding rows will be flushed to disk
// ...

- 1,507
- 3
- 17
- 24
-
2This made a huge difference for my use case. I had a relatively simple sheet with 45,000 rows and 20 columns, with no formulas or fancy formatting. Running on my production VM, it originally took a bit less than 2 minutes to generate. After switching to the streaming workbook, it's down to 6 seconds. Wow. – Ken Pronovici Mar 24 '18 at 01:19
I would be very surprised to see POI take that much time to generate such a file. I just generated a sheet with 30000 rows x 10 cells in about 18s (no formatting, to be fair). The cause might be one of the following:
- POI logging might be turned on, as described here
- you are running from swap memory
- your VM available heap might be very low

- 10,017
- 5
- 38
- 51
-
Will having international characters make the processing slower? And another question, how can increasing VM memory improve performance? – The Machine Apr 12 '10 at 11:55
-
1I don't believe international characters would make this kind of processing any slower: it's mostly about the amount of data. As for VM available heap, as the required amount of memory comes close to the available heap, the garbage collector has to kick in more often: in extreme cases, most of the CPU time is spent garbage collecting. This is a specific situation: it's not likely you're significantly affected by it. – Tomislav Nakic-Alfirevic Apr 12 '10 at 14:34
-
3in my experience POI is kinda slow and if POI needs a s***load of memory or needs logging to be turned off, then it definitely **is** a POI issue. We generate reports using POI and as soon as we generate more than a few spreadsheets it becomes very slow. Also, 30000 rows x 10 cells is really a trivially tiny amount of data for a CPU doing billions of cycles per second. So, yup, POI is quite a slow API. – Gugussee Feb 14 '11 at 10:35
-
2@Gugussee: I would say that comparing persistence of 30k rows against the CPU clock is a bit misleading. If working with large spreadsheets efficiently was easy to do, the two flagship office productivity suites wouldn't have required so many engineer/years to break the 64k-row limit boundary. – Tomislav Nakic-Alfirevic Jun 29 '11 at 10:19
If none of the other answers work out, see if Andy Khan's JExcel will be better. I've found it to be far superior to POI for dealing with Excel in Java.

- 305,152
- 44
- 369
- 561
We also use POI in our web app and do not have any performance issue with it - although our generated documents are far smaller than yours. I would first check if POI is the real issue here. Try to generate those documents without the J2EE-overhead (Unit-Test) and measure the performance. You could also monitor the load and memory usage on your J2EE server to see if the problems come from some suboptimal system settings.

- 389
- 1
- 3
I've compared Apache POI with JExcel library. It seems that JExcel is about up to 4x faster than Apache POI but memory consumption seems to be more or less the same:
@Test
public void createJExcelWorkbook() throws Exception {
WritableWorkbook workbook = Workbook.createWorkbook(new File("jexcel_workbook.xls"));
WritableSheet sheet = workbook.createSheet("sheet", 0);
for ( int i=0; i < 65535; i++) {
for ( int j=0; j < 10; j++) {
Label label = new Label(j, i, "some text " + i + " " + j);
sheet.addCell(label);
}
}
workbook.write();
workbook.close();
}
@Test
public void createPoiWorkbook() throws Exception {
Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("sheet");
for ( int i=0; i < 65535; i++) {
Row row = sheet.createRow(i);
for ( int j=0; j < 10; j++) {
Cell cell = row.createCell(j);
cell.setCellValue("some text " + i + " " + j);
}
}
FileOutputStream fileOut = new FileOutputStream("poi_workbook.xls");
wb.write(fileOut);
fileOut.close();
}
I've tested it with JExcel version 2.6.12 and Apache POI version 3.7. You need to download the latest library versions yourself and run the simple tests above to get more accurate numbers.
<dependency org="org.apache.poi" name="poi" rev="3.7"/>
<dependency org="net.sourceforge.jexcelapi" name="jxl" rev="2.6.12"/>
Note: there is a limit in Apache POI of 65535 rows per sheet.

- 981
- 2
- 11
- 18
-
4The row limit is an Excel .xls file format limitation, not a POI one. If you use the .xlsx file format (using XSSF from Apache POI) then you can create more rows – Gagravarr Jan 05 '12 at 08:25
By default POI stores the document in memory using XmlBeans, which is a lot of stuff to store for a large XLSX document.
It only becomes an XLSX document (a zip file) when you stream it out.
For large documents this is a memory hog and can be slow.
The SXSSF classes avoid some of this overhead by writing the data file to disc as it gets built up, merging that with the rest of the file (from memory) when the XLSX is output. This is a LOT faster for large workbooks, but does have limitations.
There is another approach, which is much less flexible, but a lot more efficient. This approach is to treat most of the files in the XLSX as almost fixed and to stream the whole lot out as you generate. This has its own limitations (you will have to k ow all yiur formatting up front), but is fast and memory efficient. My implementation of this approach is here: https://github.com/Yaytay/vertx-xlsx-writer, but it's pretty straightforward.
My assumption would be that the in-memory model used by Excel has very little in common with the XLSX format and that it builds up each file on at a time by scanning that model for the relevant information. You can do the same, just picking up the bits of the file format that you need.

- 493
- 4
- 13