2

I am parsing Excel using Apache POI. Is it possible to get Apache POI to treat empty strings as null and ignore them?

There can be excel sheets that sets millions of cells as Text as attached below, causing all the cells to be interpreted as empty strings by Apache POI. Since there are millions of empty string cells, checking cell by cell can cause great performance issue:

enter image description here

Is there anyway where Apache POI can ignore these empty string cells so we don't have to individually read cell by cell to check if the cell is empty string?

InputStream is = new ByteArrayInputStream(content);
Workbook wb = WorkbookFactory.create(is);
wb.setMissingCellPolicy(RETURN_BLANK_AS_NULL);
for (int i = 0; i < wb.getNumberOfSheets(); i++) {
   Sheet sheet = wb.getSheetAt(i);
   for (int i = firstRowNumber; i <= sheet.getLastRowNum(); i++) {
   Row row = sheet.getRow(i);   // Row is returning all the empty strings but I would like to ignore them and treat them as null (empty cell)
   // ...
   }
}
stackyyflow
  • 767
  • 3
  • 11
  • 30

1 Answers1

0

You can do like this to ignore the empty String,

String value = sheet.getRow(i).getCell(0).getStringCellValue();
if(value == null || value.equalsIgnoreCase("")) {
    //Your code
}
Nandan A
  • 2,702
  • 1
  • 12
  • 23
  • 1
    Thank you for sharing :) I have edited my question to be clearer, we may have millions of such cell so wondering if there is a way that we can avoid checking cell by cell? – stackyyflow Dec 19 '21 at 08:43
  • So your ask is like if a Row is empty then how to ignore that? – Nandan A Dec 19 '21 at 08:46
  • 1
    Yes, somehow Apache POI is interpreting cells as empty Strings when excel sheet is having "Text" format even though the cell itself is empty. – stackyyflow Dec 19 '21 at 08:51
  • 1
    Sorry.. as per my knowledge there is no way to check from `Row` level. – Nandan A Dec 19 '21 at 09:05
  • 2
    `getStringCellValue` never returns null, so the null check is uneceassary. But I think with the current missing cell policy he choose he has to check if `getCell` returns null. If the workbook has numeric cells you need to check the cell type, because `getStringCellValue` will throw an exception for numeric cells. – magicmn Dec 19 '21 at 09:11
  • @magicmn you are correct. We should perform `Cell` check before getting value. Like `.getCellType() == Cell.CELL_TYPE_NUMERIC` or `.getCellType() == Cell.CELL_TYPE_STRING` – Nandan A Dec 19 '21 at 09:17