How to always skip specified number of Rows using DocumentFormat.OpenXml

Question

I iterate through Rows using DocumentFormat.OpenXml, sometimes i need to start from 4th, 8th, 11th row. I define how many rows should be skipped with "skipRows" and "If" below let's me skip unnecessary rows:

var rows = sheet.Descendants<Row>();
foreach (Row row in rows)
{
    if (dataRowIndex < skipRows)
    {
         dataRowIndex++;
         continue;
    }

The problem is that sometimes when row is completely empty it automatically doesn't iterate through it. Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that? How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?

Not sure if it works, but have you tried the ".Skip()"? :) : https://stackoverflow.com/questions/6429728/is-it-possible-to-do-start-iterating-from-an-element-other-than-the-first-using — ChrisRun, Apr 01 '19 at 08:21
I think the ".Skip()" function skips rows after making a list. So it won't be a change. The problem is that "sheet.Descendants()" sometimes skips empty rows and sometimes doesn't. — Pomme De Terre, Apr 01 '19 at 08:32

score 1 · Accepted Answer · answered Apr 05 '19 at 11:40

Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that?

This is due to the way the XML schema is defined. A row is completely optional in the schema; if there's no data in a row then there's no requirement to write it to the XML (although there's nothing to stop it being written either). If there is a cell in a row then the row must be written to the XML as a cell is a child of a row; without the row there would be nowhere to write the cell.

How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?

You can use the RowIndex property of the Row to find out the actual index of the Row being read.

The following example should do what you're after:

using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, false))
{
    WorkbookPart workbookPart = document.WorkbookPart;
    WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
    SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();

    SharedStringTablePart stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();

    var rows = sheetData.Descendants<Row>();

    foreach (Row row in rows)
    {
        if (row.RowIndex <= skipRows)
        {
            continue;
        }

        //this is just to show that it's outputting from the first non-skipped row
        Cell cell = row.GetFirstChild<Cell>();
        string contents;

        if (cell.DataType == CellValues.SharedString)
        {
            int index = int.Parse(cell.CellValue.InnerText);
            contents = stringTable.SharedStringTable.ElementAt(index).InnerText;
        }
        else
        {
            contents = cell.InnerText;
        }
        Console.WriteLine(contents);
    }
}

How to always skip specified number of Rows using DocumentFormat.OpenXml

1 Answers1