47

This is sort of inline w/ Writing a large ResultSet to a File but the file in question is an Excel file.

I'm using the Apache POI library to write an Excel file with a large data set retrieved from a ResultSet object. The data could range from a few thousand records to about 1 million; not sure how this translates into file system bytes in Excel format.

The following is a test code I wrote to check out the time taken to write such a large result set and also the performance implication w.r.t CPU & Memory.

protected void writeResultsetToExcelFile(ResultSet rs, int numSheets, String fileNameAndPath) throws Exception {

    BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fileNameAndPath));
    int numColumns = rs.getMetaData().getColumnCount();

    Workbook wb = ExcelFileUtil.createExcelWorkBook(true, numSheets);
    Row heading = wb.getSheetAt(0).createRow(1);

    ResultSetMetaData rsmd = rs.getMetaData();

    for(int x = 0; x < numColumns; x++) {
        Cell cell = heading.createCell(x+1);
        cell.setCellValue(rsmd.getColumnLabel(x+1));
    }

    int rowNumber = 2;
    int sheetNumber = 0;

    while(rs.next()) {

        if(rowNumber == 65001) {
            log("Sheet " + sheetNumber + "written; moving onto to sheet " + (sheetNumber + 1));
            sheetNumber++;
            rowNumber = 2;
        }

        Row row = wb.getSheetAt(sheetNumber).createRow(rowNumber);
        for(int y = 0; y < numColumns; y++) {
            row.createCell(y+1).setCellValue(rs.getString(y+1));
            wb.write(bos);
        }

        rowNumber++;
    }

    //wb.write(bos);

    bos.close();
}

Not much luck with the above code. The file which is created seems to grow rapidly (~70Mb per sec). So I stopped the execution after about 10 minutes (killed the JVM when the file reaches 7Gb) and tried to open the file in Excel 2007. The moment I open it, the file size becomes 8k(!) and only the header and the first row are created. Not sure what I'm missing here.

Any ideas?

Community
  • 1
  • 1
Ranga
  • 618
  • 1
  • 8
  • 14
  • Have you printed out the size of your result set to verify the number of lines actually been output? Also, does your filesystem readily handle files that are 7GB? – Gian Sep 01 '11 at 17:32
  • Yes, the number of records in the ResultSet are about 944,000. I'm running on 32-bit XP Pro SP3 w/ about 3 gig RAM @ 800 MHz & Intel i7 clocking 2.8GHz. – Ranga Sep 01 '11 at 17:36
  • I don't think you need to go to a new sheet every time you had 65000 rows written, unless that's within your own requirements. – Wivani Sep 01 '11 at 18:34
  • @Wivani that was cos' Excel 2003 has a 65k limit on the number of rows in a given sheet. I believe it also has a limit on the number of columns per sheet - 255. – Ranga Sep 01 '11 at 20:15
  • 1
    But POI isn't using Excel 2003's format. So that limit is obsolete – Wivani Sep 01 '11 at 20:26
  • I could recomend use SXSSF poi 3.8 [http://stackoverflow.com/a/11919450/403999][1] [1]: http://stackoverflow.com/a/11919450/403999 – Juan Rojas Aug 12 '12 at 03:19
  • jeejava.com/handling-large-data-writing-to-excel-using-sxssf-apache-poi/ – user3470953 Jul 11 '17 at 07:26

7 Answers7

67

Using SXSSF poi 3.8

package example;

import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFSheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class SXSSFexample {


    public static void main(String[] args) throws Throwable {
        FileInputStream inputStream = new FileInputStream("mytemplate.xlsx");
        XSSFWorkbook wb_template = new XSSFWorkbook(inputStream);
        inputStream.close();

        SXSSFWorkbook wb = new SXSSFWorkbook(wb_template); 
        wb.setCompressTempFiles(true);

        SXSSFSheet sh = (SXSSFSheet) wb.getSheetAt(0);
        sh.setRandomAccessWindowSize(100);// keep 100 rows in memory, exceeding rows will be flushed to disk
    for(int rownum = 4; rownum < 100000; rownum++){
        Row row = sh.createRow(rownum);
        for(int cellnum = 0; cellnum < 10; cellnum++){
            Cell cell = row.createCell(cellnum);
            String address = new CellReference(cell).formatAsString();
            cell.setCellValue(address);
        }

    }


    FileOutputStream out = new FileOutputStream("tempsxssf.xlsx");
    wb.write(out);
    out.close();
}

}

It requires:

  • poi-ooxml-3.8.jar,
  • poi-3.8.jar,
  • poi-ooxml-schemas-3.8.jar,
  • stax-api-1.0.1.jar,
  • xml-apis-1.0.b2.jar,
  • xmlbeans-2.3.0.jar,
  • commons-codec-1.5.jar,
  • dom4j-1.6.1.jar

Useful link

bluish
  • 26,356
  • 27
  • 122
  • 180
Juan Rojas
  • 8,711
  • 1
  • 22
  • 30
  • 1
    This was very useful. Thank you. I didn't realise that the "Streaming" versions of the older methods existed. This has saved a massive amount of processing time. – DAB Feb 24 '14 at 10:16
  • When I tried to implement this I got error saying: "Dx trouble processing "javax/xml/parsers/DocumentBuilder.class": Ill-advised or mistaken usage of a core class (java.* or javax.*) when not building a core library" How can I overcome this error? – Osmani Sep 02 '14 at 10:50
  • 16
    **SXSSF allocates temporary files that you _must_ always clean up explicitly, by calling the dispose method.** Please see [SXSSF documentation](http://poi.apache.org/spreadsheet/how-to.html#sxssf). – lu_ko Mar 20 '15 at 10:18
  • 1
    I have the following error: java.lang.IllegalArgumentException: Attempting to write a row[14] in the range [0,14] that is already written to disk. At org.apache.poi.xssf.streaming.SXSSFSheet.createRow. APACHE POI 3.16 – diegoqueres Jul 07 '17 at 23:21
  • 1
    Nice solution. Saves time and memory a lot! Thank you very much. – Vadim Kirilchuk Feb 10 '18 at 12:53
10

Oh. I think you're writing the workbook out 944,000 times. Your wb.write(bos) call is in the inner loop. I'm not sure this is quite consistent with the semantics of the Workbook class? From what I can tell in the Javadocs of that class, that method writes out the entire workbook to the output stream specified. And it's gonna write out every row you've added so far once for every row as the thing grows.

This explains why you're seeing exactly 1 row, too. The first workbook (with one row) to be written out to the file is all that is being displayed - and then 7GB of junk thereafter.

Gian
  • 13,735
  • 44
  • 51
  • Thanks Gian. I did that cos' I ran out of heap space (set at 1024M) while creating the entire Workbook in memory before dumping it to the stream. My perm gen size is set at 256M (high enough as well). You can see that I've commented out the wb.write(bos) just before bos.close(). – Ranga Sep 01 '11 at 17:48
  • Yeah, I guessed you had been testing it both ways. My guess is that if it's not possible to generate the entire thing in one go, split it into different files and then figure out how to merge them later (possibly in Excel itself?) – Gian Sep 01 '11 at 17:51
3

You can using SXSSFWorkbook implementation of Workbook, if you use style in your excel ,You can caching style by Flyweight Pattern to improve your performance. enter image description here

Alireza Alallah
  • 2,486
  • 29
  • 35
3

Unless you have to write formulas or formatting you should consider writing out a .csv file. Infinitely simpler, infinitely faster, and Excel will do the conversion to .xls or .xlsx automatically and correctly by definition.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • 4
    Very true, but unfortunately my requirement is to generate xls/xlsx files :) – Ranga Sep 02 '11 at 03:27
  • 1
    Excel will also mess up fields that look like numbers but aren't. You need to know the CSV will be opened in Excel. – John Allen May 16 '17 at 13:52
2

You can increase the performance of excel export by following these steps:

1) When you fetch data from database, avoid casting the result set to the list of entity classes. Instead assign it directly to List

List<Object[]> resultList =session.createSQLQuery("SELECT t1.employee_name, t1.employee_id ... from t_employee t1 ").list();

instead of

List<Employee> employeeList =session.createSQLQuery("SELECT t1.employee_name, t1.employee_id ... from t_employee t1 ").list();

2) Create excel workbook object using SXSSFWorkbook instead of XSSFWorkbook and create new row using SXSSFRow when the data is not empty.

3) Use java.util.Iterator to iterate the data list.

Iterator itr = resultList.iterator();

4) Write data into excel using column++.

int rowCount = 0;
int column = 0;
while(itr.hasNext()){
 SXSSFRow row = xssfSheet.createRow(rowCount++);

 Object[] object = (Object[]) itr.next();
 //column 1     
 row.setCellValue(object[column++]); // write logic to create cell with required style in setCellValue method
 //column 2
 row.setCellValue(object[column++]);
 itr.remove();
}

5) While iterating the list, write the data into excel sheet and remove the row from list using remove method. This is to avoid holding unwanted data from the list and clear the java heap size.

itr.remove();

0

For now I took @Gian's advice & limited the number of records per Workbook to 500k and rolled over the rest to the next Workbook. Seems to be working decent. For the above configuration, it took me about 10 mins per workbook.

Ranga
  • 618
  • 1
  • 8
  • 14
0

I updated BigGridDemo to support multiple sheets.

BigExcelWriterImpl.java

package com.gdais.common.apache.poi.bigexcelwriter;

import static com.google.common.base.Preconditions.*;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;

import javax.annotation.Nonnull;
import javax.annotation.Nullable;

import org.apache.commons.io.FilenameUtils;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import com.google.common.base.Function;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.Iterables;

public class BigExcelWriterImpl implements BigExcelWriter {

private static final String XML_ENCODING = "UTF-8";

@Nonnull
private final File outputFile;

@Nullable
private final File tempFileOutputDir;

@Nullable
private File templateFile = null;

@Nullable
private XSSFWorkbook workbook = null;

@Nonnull
private LinkedHashMap<String, XSSFSheet> addedSheets = new LinkedHashMap<String, XSSFSheet>();

@Nonnull
private Map<XSSFSheet, File> sheetTempFiles = new HashMap<XSSFSheet, File>();

BigExcelWriterImpl(@Nonnull File outputFile) {
    this.outputFile = outputFile;
    this.tempFileOutputDir = outputFile.getParentFile();
}

@Override
public BigExcelWriter createWorkbook() {
    workbook = new XSSFWorkbook();
    return this;
}

@Override
public BigExcelWriter addSheets(String... sheetNames) {
    checkState(workbook != null, "workbook must be created before adding sheets");

    for (String sheetName : sheetNames) {
        XSSFSheet sheet = workbook.createSheet(sheetName);
        addedSheets.put(sheetName, sheet);
    }

    return this;
}

@Override
public BigExcelWriter writeWorkbookTemplate() throws IOException {
    checkState(workbook != null, "workbook must be created before writing template");
    checkState(templateFile == null, "template file already written");

    templateFile = File.createTempFile(FilenameUtils.removeExtension(outputFile.getName())
            + "-template", ".xlsx", tempFileOutputDir);
    System.out.println(templateFile);
    FileOutputStream os = new FileOutputStream(templateFile);
    workbook.write(os);
    os.close();

    return this;
}

@Override
public SpreadsheetWriter createSpreadsheetWriter(String sheetName) throws IOException {
    if (!addedSheets.containsKey(sheetName)) {
        addSheets(sheetName);
    }

    return createSpreadsheetWriter(addedSheets.get(sheetName));
}

@Override
public SpreadsheetWriter createSpreadsheetWriter(XSSFSheet sheet) throws IOException {
    checkState(!sheetTempFiles.containsKey(sheet), "writer already created for this sheet");

    File tempSheetFile = File.createTempFile(
            FilenameUtils.removeExtension(outputFile.getName())
                    + "-sheet" + sheet.getSheetName(), ".xml", tempFileOutputDir);

    Writer out = null;
    try {
        out = new OutputStreamWriter(new FileOutputStream(tempSheetFile), XML_ENCODING);
        SpreadsheetWriter sw = new SpreadsheetWriterImpl(out);

        sheetTempFiles.put(sheet, tempSheetFile);
        return sw;
    } catch (RuntimeException e) {
        if (out != null) {
            out.close();
        }
        throw e;
    }
}

private static Function<XSSFSheet, String> getSheetName = new Function<XSSFSheet, String>() {

    @Override
    public String apply(XSSFSheet sheet) {
        return sheet.getPackagePart().getPartName().getName().substring(1);
    }
};

@Override
public File completeWorkbook() throws IOException {
    FileOutputStream out = null;
    try {
        out = new FileOutputStream(outputFile);
        ZipOutputStream zos = new ZipOutputStream(out);

        Iterable<String> sheetEntries = Iterables.transform(sheetTempFiles.keySet(),
                getSheetName);
        System.out.println("Sheet Entries: " + sheetEntries);
        copyTemplateMinusEntries(templateFile, zos, sheetEntries);

        for (Map.Entry<XSSFSheet, File> entry : sheetTempFiles.entrySet()) {
            XSSFSheet sheet = entry.getKey();
            substituteSheet(entry.getValue(), getSheetName.apply(sheet), zos);
        }
        zos.close();
        out.close();

        return outputFile;
    } finally {
        if (out != null) {
            out.close();
        }
    }
}

private static void copyTemplateMinusEntries(File templateFile,
        ZipOutputStream zos, Iterable<String> entries) throws IOException {

    ZipFile templateZip = new ZipFile(templateFile);

    @SuppressWarnings("unchecked")
    Enumeration<ZipEntry> en = (Enumeration<ZipEntry>) templateZip.entries();
    while (en.hasMoreElements()) {
        ZipEntry ze = en.nextElement();
        if (!Iterables.contains(entries, ze.getName())) {
            System.out.println("Adding template entry: " + ze.getName());
            zos.putNextEntry(new ZipEntry(ze.getName()));
            InputStream is = templateZip.getInputStream(ze);
            copyStream(is, zos);
            is.close();
        }
    }
}

private static void substituteSheet(File tmpfile, String entry,
        ZipOutputStream zos)
        throws IOException {
    System.out.println("Adding sheet entry: " + entry);
    zos.putNextEntry(new ZipEntry(entry));
    InputStream is = new FileInputStream(tmpfile);
    copyStream(is, zos);
    is.close();
}

private static void copyStream(InputStream in, OutputStream out) throws IOException {
    byte[] chunk = new byte[1024];
    int count;
    while ((count = in.read(chunk)) >= 0) {
        out.write(chunk, 0, count);
    }
}

@Override
public Workbook getWorkbook() {
    return workbook;
}

@Override
public ImmutableList<XSSFSheet> getSheets() {
    return ImmutableList.copyOf(addedSheets.values());
}

}

SpreadsheetWriterImpl.java

package com.gdais.common.apache.poi.bigexcelwriter;

import java.io.IOException;
import java.io.Writer;
import java.util.Calendar;

import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.util.CellReference;

class SpreadsheetWriterImpl implements SpreadsheetWriter {

private static final String XML_ENCODING = "UTF-8";

private final Writer _out;
private int _rownum;

SpreadsheetWriterImpl(Writer out) {
    _out = out;
}

@Override
public SpreadsheetWriter closeFile() throws IOException {
    _out.close();

    return this;
}

@Override
public SpreadsheetWriter beginSheet() throws IOException {
    _out.write("<?xml version=\"1.0\" encoding=\""
            + XML_ENCODING
            + "\"?>"
            +
            "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\">");
    _out.write("<sheetData>\n");

    return this;
}

@Override
public SpreadsheetWriter endSheet() throws IOException {
    _out.write("</sheetData>");
    _out.write("</worksheet>");

    closeFile();
    return this;
}

/**
 * Insert a new row
 * 
 * @param rownum
 *            0-based row number
 */
@Override
public SpreadsheetWriter insertRow(int rownum) throws IOException {
    _out.write("<row r=\"" + (rownum + 1) + "\">\n");
    this._rownum = rownum;

    return this;
}

/**
 * Insert row end marker
 */
@Override
public SpreadsheetWriter endRow() throws IOException {
    _out.write("</row>\n");

    return this;
}

@Override
public SpreadsheetWriter createCell(int columnIndex, String value, int styleIndex)
        throws IOException {
    String ref = new CellReference(_rownum, columnIndex).formatAsString();
    _out.write("<c r=\"" + ref + "\" t=\"inlineStr\"");
    if (styleIndex != -1) {
        _out.write(" s=\"" + styleIndex + "\"");
    }
    _out.write(">");
    _out.write("<is><t>" + value + "</t></is>");
    _out.write("</c>");

    return this;
}

@Override
public SpreadsheetWriter createCell(int columnIndex, String value) throws IOException {
    createCell(columnIndex, value, -1);

    return this;
}

@Override
public SpreadsheetWriter createCell(int columnIndex, double value, int styleIndex)
        throws IOException {
    String ref = new CellReference(_rownum, columnIndex).formatAsString();
    _out.write("<c r=\"" + ref + "\" t=\"n\"");
    if (styleIndex != -1) {
        _out.write(" s=\"" + styleIndex + "\"");
    }
    _out.write(">");
    _out.write("<v>" + value + "</v>");
    _out.write("</c>");

    return this;
}

@Override
public SpreadsheetWriter createCell(int columnIndex, double value) throws IOException {
    createCell(columnIndex, value, -1);

    return this;
}

@Override
public SpreadsheetWriter createCell(int columnIndex, Calendar value, int styleIndex)
        throws IOException {
    createCell(columnIndex, DateUtil.getExcelDate(value, false), styleIndex);

    return this;
}

@Override
public SpreadsheetWriter createCell(int columnIndex, Calendar value)
        throws IOException {
    createCell(columnIndex, value, -1);

    return this;
}
}
John B
  • 32,493
  • 6
  • 77
  • 98
  • I have a a problem exporting huge excel file, roughly 800,000 records. Which BigGridDemo you have updated to support multiple sheets. Is BigExcelWriterImpl or is it updated in the docjar? – user525146 Sep 11 '14 at 15:54
  • how can we merge cells? – vaibhav Jul 18 '19 at 16:23