0

I have to map Excel files into database values. Given that the Excel files have a lot of rows I'm using https://github.com/monitorjbl/excel-streaming-reader because otherwise, with Apache POI, I receive memory error. This is the old code that doesn't work with big file:

/**
 * Private method called from single and multiple file upload to retrieve the excel fields values
 * This method works if THE EXCEL FILE IS WITHOUT empty row!!
 * @param file
 * @return
 * @throws IOException
 */
private ExcelField[][] getExcelField(MultipartFile file) throws IOException{
    try{
        ArrayList<ArrayList<ExcelField>> valuesMatrix= new ArrayList<ArrayList<ExcelField>>();
        Workbook wb = WorkbookFactory.create(file.getInputStream());
        //Sheet to use
        Sheet firstSheet = wb.getSheetAt(0);
        //Assuming that the first row is without blank cell and it has the maximum number of columns.
        //numCol IS NECESSARY BECAUSE IF THERE IS A EMPTY CELL AT THE LAST POSITION OF THE ROW THE GETLASTCELLNUM RETURN 
        //THE NUMBER OF ROWS WITHOUT THE LAST
        int numCol=firstSheet.getRow(0).getLastCellNum();
        int numRow=firstSheet.getPhysicalNumberOfRows();
        for(int rw=0;rw<numRow;rw++) {
            Row row=firstSheet.getRow(rw);
            ArrayList<ExcelField> rowValues=new ArrayList<ExcelField>();
            for(int cn=0; cn<numCol; cn++) {
                Cell cell = row.getCell(cn, Row.CREATE_NULL_AS_BLANK);
                rowValues.add(getCellValue(cell));                  
                //Print into console the fields values
                //System.out.print(getCellValue(cell).getValue()!=null?getCellValue(cell).getValue().toString():" null"+" ");
            }
            valuesMatrix.add(rowValues);
            //System.out.println();
        }
        wb.close();         
        return convert2DimensionalArrayToMatrix(valuesMatrix);
    } catch (EncryptedDocumentException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (InvalidFormatException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return null;
}

In Excel files there are formulas, dates, double, merged cells etc. and in this moment I have problem, only with new method, with formulas and dates, often I receive str or other type unknown exception.

Caused by: java.lang.UnsupportedOperationException: Unsupported cell type 'str'
at com.monitorjbl.xlsx.impl.StreamingCell.getCellType(StreamingCell.java:106)
at com.mkyong.services.FileServicesImpl.getCellValue(FileServicesImpl.java:193)
at com.mkyong.services.FileServicesImpl.getExcelFieldSpeedImproved(FileServicesImpl.java:176)
at com.mkyong.services.FileServicesImpl.singleFileOpen(FileServicesImpl.java:86)

The date, instead, return null value. I don't care about the formula but I'm interesting in value only because I have to store it in database. This is my code:

@Override
public Response<ExcelField[][]> singleFileOpen(MultipartFile file) throws FileEmptyException, FileUploadException{
    if (!file.isEmpty()) {
        try {
            openedFile=file;
            //Retrieve the fields values
            //              ExcelField[][] valuesMatrix=getExcelField(openedFile);
            ExcelField[][] valuesMatrix=getExcelFieldSpeedImproved(openedFile);
            //              //return the Response with status ad excel fields
            return new Response<ExcelField[][]>(HttpStatus.OK.value(),valuesMatrix);
        } catch (Exception e) {
            throw new FileUploadException("You failed to read " + openedFile.getOriginalFilename(), e);
        }
    } else {
        throw new FileEmptyException("You failed to read" );
    }
}

and the main method:

private ExcelField[][] getExcelFieldSpeedImproved(MultipartFile file) throws IOException{
    ArrayList<ArrayList<ExcelField>> valuesMatrix= new ArrayList<ArrayList<ExcelField>>();
    InputStream is = file.getInputStream();
    StreamingReader reader = StreamingReader.builder()
            .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
            .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
            .sheetIndex(0)        // index of sheet to use (defaults to 0)
            .read(is);            // InputStream or File for XLSX file (required)
    for (Row r : reader) {
        ArrayList<ExcelField> rowValues=new ArrayList<ExcelField>();
        for (Cell c : r) {
            rowValues.add(getCellValue(c));
            System.out.print((String)getCellValue(c).getValue()+" ");
        }
        valuesMatrix.add(rowValues);
        System.out.println();

    }  
    return convert2DimensionalArrayToMatrix(valuesMatrix);
}

and the type choice:

/**
 * Retrieve the value in one excel field
 * @param cell
 * @return
 */
private ExcelField getCellValue(Cell cell) {

        switch (cell.getCellType()) {
        case Cell.CELL_TYPE_STRING:
            return new ExcelField(ExcelTypeEnum.STRING,cell.getStringCellValue());
        case Cell.CELL_TYPE_BOOLEAN:
            return new ExcelField(ExcelTypeEnum.BOOLEAN,cell.getBooleanCellValue());
        case Cell.CELL_TYPE_NUMERIC:
            return new ExcelField(ExcelTypeEnum.NUMERIC,cell.getNumericCellValue());
        case Cell.CELL_TYPE_ERROR:
            return new ExcelField(ExcelTypeEnum.ERROR,cell.getErrorCellValue());
        case Cell.CELL_TYPE_FORMULA:
            return new ExcelField(ExcelTypeEnum.FORMULA,cell.getCachedFormulaResultType()); 
        default:
            return new ExcelField(ExcelTypeEnum.BLANK,null);
        }
}

What is the best way to retrieve Excel values?

Update: with this code I resolve with old method and dates and formulas:

    /**
 * Retrieve the value in one excel field
 * @param cell
 * @return
 */
private ExcelField getCellValue(int cellType, Cell cell) {
    switch (cellType) {
    case Cell.CELL_TYPE_STRING:
        return new ExcelField(ExcelTypeEnum.STRING,cell.getStringCellValue());
    case Cell.CELL_TYPE_BOOLEAN:
        return new ExcelField(ExcelTypeEnum.BOOLEAN,cell.getBooleanCellValue());
    case Cell.CELL_TYPE_NUMERIC:
        if (HSSFDateUtil.isCellDateFormatted(cell))
            //SimpleDateFormat dateFormat = new SimpleDateFormat("dd-MM-yyyy");
            //System.out.print(dateFormat.format(cell.getDateCellValue()) + "\t\t");
                return new ExcelField(ExcelTypeEnum.NUMERIC,cell.getDateCellValue());
        return new ExcelField(ExcelTypeEnum.NUMERIC,cell.getNumericCellValue());
    case Cell.CELL_TYPE_ERROR:
        return new ExcelField(ExcelTypeEnum.ERROR,cell.getErrorCellValue());
    case Cell.CELL_TYPE_FORMULA:
            return getCellValue(cell.getCachedFormulaResultType(),cell);            
    case Cell.CELL_TYPE_BLANK:
        return new ExcelField(ExcelTypeEnum.BLANK,null);
    default:
        return new ExcelField(ExcelTypeEnum.BLANK,cell.getStringCellValue());
    }
}
luca
  • 3,248
  • 10
  • 66
  • 145
  • Looks like a buggy wrapper around Apache POI. Why not call the Apache POI XLSX event-streaming code yourself? – Gagravarr Oct 06 '15 at 22:21
  • I posted the old code, I had to wait a lot of time on Workbook wb = WorkbookFactory.create(file.getInputStream()); and then I receive memory exception – luca Oct 07 '15 at 06:53
  • WorkbookFactory uses the UserModel code, which is easy but uses memory. The [POI XSSF low-memory streaming code is documented here](http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api) – Gagravarr Oct 07 '15 at 08:08
  • maybe I have resolved with this link https://mail-archives.apache.org/mod_mbox/poi-user/200905.mbox/%3C4A1308B6.8010507@invest-faq.com%3E – luca Oct 07 '15 at 13:19
  • You'd be much better [following the Apache POI example](https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/XLSX2CSV.java), which shows just how easy it can be to do it right! – Gagravarr Oct 07 '15 at 15:19
  • is it possible to scan from one row to another? For example scan only fourth row or first three row...thanks – luca Oct 08 '15 at 08:13
  • Sure, just load it up into memory with XSSF Usermodel! If you want to process it in a streaming way, then as the name suggests, you have to catch the things of interest as they stream past... – Gagravarr Oct 08 '15 at 08:24
  • How can I get an ArrayList> created into MyXSSFSheetHandler ? I tried both get method and putting information into another class but the result is an empty object – luca Oct 09 '15 at 14:02

0 Answers0