9

I'm reading an xlsx file using NPOI lib, with C#. I need to extract some of the excel columns and save the extracted values into some kind of data structure.

I can successfully read the file and get all the values from the 2nd (the first one contains only headers) to the last row with the following code:

...
workbook = new XSSFWorkbook(fs);
sheet = (XSSFSheet)workbook.GetSheetAt(0);
....
int rowIndex = 1;  //--- SKIP FIRST ROW (index == 0) AS IT CONTAINS TEXT HEADERS
while (sheet.GetRow(rowIndex) != null) {
    for (int i = 0; i < this.columns.Count; i++){
       int colIndex = this.columns[i].colIndex;
       ICell cell = sheet.GetRow(rowIndex).GetCell(colIndex);
       cell.SetCellType(CellType.String);
       String cellValue = cell.StringCellValue;
       this.columns[i].values.Add(cellValue); //--- Here I'm adding the value to a custom data structure
    }
    rowIndex++;
}

What I'd like to do now is check if the excel file is empty or if it has only 1 row in order to properly handle the issue and display a message

If I run my code against an excel file with only 1 row (headers), it breaks on

cell.SetCellType(CellType.String); //--- here cell is null

with the following error:

Object reference not set to an instance of an object.

I also tried to get the row count with

sheet.LastRowNum

but it does not return the right number of rows. For example, I have created an excel with 5 rows (1xHEADER + 4xDATA), the code reads successfully the excel values. On the same excel I have removed the 4 data rows and then I have launched again the code on the excel file. sheet.LastRowNum keeps returning 4 as result instead of 1.... I think this is related to some property bound to the manually-cleaned sheet cells.

Do you have any hint to solve this issue?

BeNdErR
  • 17,471
  • 21
  • 72
  • 103
  • Have you tried using the property `PhysicalNumberOfRows`? I wonder what value that property would have. I was unable to find any good documentation for the NPOI C# library but this library is based on the java library POI and following is the documentation for both of those values: http://poi.apache.org/apidocs/dev/org/apache/poi/hssf/usermodel/HSSFSheet.html#getLastRowNum-- and http://poi.apache.org/apidocs/dev/org/apache/poi/hssf/usermodel/HSSFSheet.html#getPhysicalNumberOfRows-- – Ida N Nov 02 '18 at 02:34

4 Answers4

8

I think it would be wise to use sheet.LastRowNum which should return the amount of rows on the current sheet

McLoving
  • 191
  • 1
  • 9
  • 2
    Please see my question, I already tried that but it is not working as expected. – BeNdErR Aug 14 '15 at 06:12
  • That is because theoretically those cells still exists in the excel but as empty cells with metadata attached to those cells depending on how you deleted the content of those cells. This will always be a problem with excel documents especially with non technical users. You would have to add custom validation for your own case testing if the data in these cells are valid – McLoving Aug 14 '15 at 09:49
  • `sheet.LastRowNum` returns last row count, not all rows length. Correct way is `sheet.PhysicalNumberOfRows`. – Gray Programmerz May 10 '22 at 07:44
5

Am I oversimplifying?

 bool hasContent = false;

 while (sheet.GetRow(rowIndex) != null)
        {
            var row = rows.Current as XSSFRow;
            //all cells are empty, so is a 'blank row'
            if (row.Cells.All(d => d.CellType == CellType.Blank)) continue;  


            hasContent = true;
        }
Gumzle
  • 847
  • 6
  • 16
1

You can retrieve the number of rows using this code:

public int GetTotalRowCount(bool warrant = false)
{
    IRow headerRow = activeSheet.GetRow(0);
    if (headerRow != null)
    {
        int rowCount = activeSheet.LastRowNum + 1;
        return rowCount;
    }
    return 0;
}
Dirk Vollmar
  • 172,527
  • 53
  • 255
  • 316
Dayong
  • 11
  • 1
0

Here is a way to get both the actual last row index and the number of physically existing rows:

    public static int LastRowIndex(this ISheet aExcelSheet)
    {
        IEnumerator rowIter = aExcelSheet.GetRowEnumerator();
        return rowIter.MoveNext()
        ? aExcelSheet.LastRowNum
        : -1;
    }

    public static int RowsSpanCount(this ISheet aExcelSheet)
    {
        return aExcelSheet.LastRowIndex() + 1;
    }

    public static int PhysicalRowsCount(this ISheet aExcelSheet )
    {
        if (aExcelSheet == null)
        {
            return 0;
        }

        int rowsCount = 0;
        IEnumerator rowEnumerator = aExcelSheet.GetRowEnumerator();
        while (rowEnumerator.MoveNext())
        {
            ++rowsCount;
        }

        return rowsCount;
    }
TodorBalabanski
  • 109
  • 2
  • 10