I build a very large POI workbook, on my web server. Holding the entire workbook in memory , will not scale for multiple concurrent requests. Is there a way i can progressively write the workbook to the servlet output stream. That should reduce the response time , as well as make the process memory efficient.
5 Answers
If you are about to generate Excel 2007 (xslx) then you could adapt the approach of BigGridDemo.java as described here: http://web.archive.org/web/20110821054135/http://www.realdevelopers.com/blog/code/excel
The solution is to let POI generate a container xslx as a template only and stream the actual spreadsheet data as XML into a zip output stream. Streamlining the XML generation is then up to you.

- 1,321
- 13
- 29
-
1This works like a charm and is the best answer here so far, because this approach does not require neither excessive memory, nor flushing main contents to disk, nor does it cause TTFB timeouts on the client. No need to know the full document contents in advance. – Sergey Shcherbakov May 10 '19 at 09:36
The situation has improved considerably since the rest of the answers were written - Streaming is now part of Apache Poi.
See the SXSSFWorkbook class, and the documentation here. It uses a streaming window over the sheet, flushing old rows outside the window to temporary files.
This is based on the BigGridDemo
approach used in hlg's answer, but now part of the official distribution.
Here's the example from the documentation:
public static void main(String[] args) throws Throwable {
// keep 100 rows in memory, exceeding rows will be flushed to disk
SXSSFWorkbook wb = new SXSSFWorkbook(100);
Sheet sh = wb.createSheet();
for(int rownum = 0; rownum < 1000; rownum++){
Row row = sh.createRow(rownum);
for(int cellnum = 0; cellnum < 10; cellnum++){
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
}
// Rows with rownum < 900 are flushed and not accessible
for(int rownum = 0; rownum < 900; rownum++){
Assert.assertNull(sh.getRow(rownum));
}
// ther last 100 rows are still in memory
for(int rownum = 900; rownum < 1000; rownum++){
Assert.assertNotNull(sh.getRow(rownum));
}
FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
wb.write(out);
out.close();
// dispose of temporary files backing this workbook on disk
wb.dispose();
}

- 1
- 1

- 21,495
- 6
- 60
- 90
-
1This example seems to still keep the whole workbook in memory before streaming it out. Interesting, whether the wb.write(out) call can be placed into the for loop in order to address the initial question. – Sergey Shcherbakov Apr 10 '19 at 15:56
-
1The problem is also not only memory, but the time needed sometimes to prepare the entire document. It seems reasonable to start streaming the data out as they become available, instead of waiting for all of them first. – Sergey Shcherbakov Apr 10 '19 at 15:58
-
1@SergeyShcherbakov Thanks for the comment, but I'm afraid that's not correct. `SXSSFWorkbook` flushes temporary data to disk behind the scenes, so this example does not keep the whole workbook in memory. – Timothy Jones May 09 '19 at 00:33
-
3A temporary format is necessary because the xlsx format can't be written as a stream unless you know the whole document first (it has summary information and forward references) - and this temporary format is handled by the `SXSSFWorkbook`. For more information, see the class documentation. – Timothy Jones May 09 '19 at 00:33
-
2I have successfully implemented streaming of a very big Excel document to a web client using the approach from the winning answer here. http://web.archive.org/web/20110821054135/http://www.realdevelopers.com/blog/code/excel No excessive memory, no flushing to disk, no TTFB timeouts on the client. – Sergey Shcherbakov May 10 '19 at 09:25
-
1so "can't be written as a stream unless you know the whole document first" is not really true. Some preparation is needed, but I was able to stream to an open OutputStream without knowing the document contents in advance. In fact I was dynamically generating the contents. – Sergey Shcherbakov May 10 '19 at 09:31
-
Glad you got it working! My understanding is that some documents still need some information about the whole document to be known ahead of time (but clearly not all documents or all information, as you pointed out). The example at that link is based on the `BigGridDemo` example, which became the `SXSSFWorkbook` class. The workbook class is pretty general (it supports limited random access), and still allows streaming a POI document without keeping the whole thing in memory. If your use case is more specific, a custom class like yours could absolutely guarantee better performance. – Timothy Jones May 13 '19 at 07:07
Unfortunately, that's impossible when there's no means of sequential data. I'd suggest to look for another format, e.g. CSV or XML. Both can be written out sequentially. If it's coming from a DB, it can even be done more efficient since a decent DB has builtin facilities to efficiently export to those formats. You just have to stream the bytes from one to other side.

- 1,082,665
- 372
- 3,610
- 3,555
Did you tried with the write method direct to the HttpServletResponse.getOutputStream()?
Please take a look at the following example:
HSSFWorkbook wb = new HSSFWorkbook();
HSSFSheet sheet = wb.createSheet("new sheet");
...
OutputStream out = response.getOutputStream();
wb.write(out);
out.close();

- 879
- 1
- 6
- 15
-
2-1 This does not answer the question. The OP explicitly states that the issue he is trying to solve is how to avoid creating everything and then writing it out all at once. – Ophidian Oct 20 '10 at 13:49
-
-
If you use JExcel It has sample code to read stream code to and from a Servlet. http://jexcelapi.sourceforge.net/resources/faq/
The only downside to this API looks like it only supports up to Excel 2003 inclusive.
Using POI - Can you not create the file and serve the file's bytes to the servlet output stream ?

- 24,113
- 5
- 60
- 79
-
1I am writing the POI workbook to the servlet outputstream. What happens behind the scenes, is it writes thte bytes into the outputstream. My question is, I don't know how to write it progressively. I have to wait until the entire workbook is created, and only then write it to I/O. And it takes around a minute to write to I/O. Don't know if that is justified. – The Machine Apr 21 '10 at 05:23
-
How big is your spreadsheet ? Do you create the spreadsheet, or is it given to you ? If it given to you, do you modify it ? What is taking so much time to write to I/O ? – Romain Hippeau Apr 21 '10 at 11:45