Below, as commented, is an example of removing the extra “empty” rows from the DataTable.
There are a couple of ways to approach this. One is to clean the Excel file of the extras rows as I am aware that using Excel’s UsedRange
property has a nasty habit of flagging rows that have no apparent data as NOT empty. This may be from formatting or other issues. I have a solution for that if you want to go down that rabbit hole. Fastest method to remove Empty rows and Columns From Excel Files using Interop
However, this solution was heavily based on LARGE Excel files with many rows and columns. If the files are not large, then the solution below should work.
Even though your posted code has some much-needed range checking (more below), using the posted code, I was able to read an Excel file that produced extra “empty” rows at the end. It is these rows we want to remove from the DataTable.
I am sure there are other ways to do this, however, a basic approach would be to simply loop through the DataTable
rows, and check each cell… and, if ALL the cells on that row are “empty” then remove that row. This is the approach I used below.
To help get this done quickly, keeping this to one loop through the table is a goal. In other words, we want to loop through the table and remove rows from that SAME table. This will mean that extra care is needed. Obviously a foreach
loop through the rows will not work.
However, a simple for
loop will work, as long as we start at the bottom and work up. AND we need to make sure and NOT use dt.Rows.Count
as an “ending” condition in the for
loop through the rows. As this could possibly cause some problems. This is easily avoided by simply fixing the row count to a variable and use it as an ending condition. This will allow the code to delete the rows from the bottom up and not have to worry about getting the row and loop indexes mixed up.
A walkthrough of the code would go like… First a bool
variable allEmpty
is created to indicate if ALL the cells in a row are “empty.” For each row, we will set this variable to true
to indicate that the row is empty. Then a loop through each cell of that row and check if each cell is NOT empty. If at least one of the cells in that row is NOT empty, then, we set allEmpty
to false
and break
out of the columns loop. After the columns loop is exited, the code simply checks to see if that row is empty and if so, deletes that row.
It should be noted in the last if
statement that checks for the empty row. When the FIRST non-empty row is found, then in this context where we are only wanting to delete the last “empty” rows, then, we are done and can break out of the rows loop and exit since we are only looking for the LAST empty rows.
If you comment out the else
portion of the bottom if
code, then, the code will remove ALL the empty rows.
bool allEmpty;
int rowCount = dt.Rows.Count - 1;
for (int dtRowIndex = rowCount; dtRowIndex >= 0; dtRowIndex--) {
allEmpty = true;
for (int dtColIndex = 0; dtColIndex < dt.Columns.Count; dtColIndex++) {
if (dt.Rows[dtRowIndex].ItemArray[dtColIndex].ToString() != "") {
allEmpty = false;
break;
}
}
if (allEmpty) {
dt.Rows.RemoveAt(dtRowIndex);
}
else {
break;
}
}
Eye brow raiser for the posted code…
The current posted code makes some dangerous assumptions in relation to what is returned from UsedRange
and the dt
column indexes. Example, the code starts by grabbing the worksheets UsedRange.
range = worksheet.UsedRange;
We obviously NEED this info, however, at this point in the code, we have NO clue how many rows or columns have been returned. Therefore, when the code gets to the second for
loop through the columns... The code uses this column
index as an index into the data row dr
…
dr[column - 1] = …
Since the data table dt
only has 6 columns, this is a risky assignment without checking the index range. Since used range grabs the used cells, what if a user added some text into column 7, 8 or ANY cell greater than 6, then this code will crash and burn. The code MUST check the number of columns returned from UsedRange
to avoid an index out of range exception.
There are a couple of ways you could fix this. One would be to set the column loop ending condition to the number of columns in the data table. Unfortunately, this still leads to checking the number of columns returned by the used range considering it may return less columns than the data table has and the code will crash on the same line above only on the right side of the “=” equation.
= Convert.ToString((range.Cells[row, column] as Excel.Range).Value);
In both cases it is clear your code needs to check these ranges BEFORE you start the looping through the used range.
Lastly, if you must use Excel Interop, which is usually a last option case, then you need to minimize the possibility of leaking the COM objects (leaking resources), such that when something goes wrong your code still releases the COM objects the code creates. When using Interop, I suggest you wrap all the Excel code in a try/catch/finally
statement. In the try
portion you have the code. And the Finally
portion is where you close the excel workbook, quit the excel application and release the COM objects.
You will need to decide what to do in the catch
portion of code. A simple message box displayed to the user may suffice to tell the user there was an error, the user clicks OK, and the code executes the finally
code. Point being, that you want to display something instead of simply swallowing the error.
This approach may look something like…
Microsoft.Office.Interop.Excel.Application ExcelApp = null;
Microsoft.Office.Interop.Excel.Workbook Workbook = null;
Microsoft.Office.Interop.Excel.Worksheet Worksheet = null;
try {
// code that works with excel interop
}
catch (Exception e) {
MessageBox.Show("Error Excel: " + e.Message);
}
finally {
if (Worksheet != null) {
Marshal.ReleaseComObject(Worksheet);
}
if (Workbook != null) {
//Workbook.Save();
Workbook.Close();
Marshal.ReleaseComObject(Workbook);
}
if (ExcelApp != null) {
ExcelApp.Quit();
Marshal.ReleaseComObject(ExcelApp);
}
}
I hope this makes sense and helps.