I am trying to remove some rows in a table on a MS Word document. Below is how the table, before processing looks like:
I analyzed this table to understand the open XML representation the below is how the InnerText
property is being formulated :
Items | Description | null |
---|---|---|
Classroom | empty |
Interactive Classroom... |
empty |
empty |
Case Study Classrooms ... |
empty |
empty |
Auditoria Lecture Classrooms ... |
Computers | empty |
Mainframe Computer... |
empty |
empty |
Supercomputer... |
empty |
empty |
Workstation Computer... |
The middle empty
column is where the image is inserted. Image and the description are in two different cells, having an invisible border in between them.
Below is the code to remove items "Case Study Classrooms", "Supercomputer", "Workstation Computer","Personal Computer" and "Tablet".
var itemsToBeExcluded = new List<string>{"Case Study Classrooms", "Supercomputer", "Workstation Computer","Personal Computer","Tablet"};
using (MemoryStream stream = new MemoryStream())
{
//pageData is a byte[] to represent the word file
stream.Write(pageData, 0, (int)pageData.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(stream, true))
{
var table = wordDoc.MainDocumentPart.Document.Body.OfType<Table>().FirstOrDefault();
int rowCount = 0;
string firstColumnInnerXml = string.Empty;
for (int t = 0; t<table.ChildElements.Count; t++)
{
if(table.ChildElements[t] is TableRow)
{
// Skip the header
if (rowCount++ != 0)
{
// Gets the inner xml of first column of the table and set if it is null for the subsequent rows
if (table.ChildElements[t].ChildElements[1].InnerText.Length > 0)
{
firstColumnInnerXml = table.ChildElements[t].ChildElements[1].InnerXml;
}
else
{
table.ChildElements[t].ChildElements[1].InnerXml = firstColumnInnerXml;
}
foreach (var removableItem in itemsToBeExcluded)
{
if (table.ChildElements[t].ChildElements[3].InnerText.ToLower().StartsWith(removableItem.ToLower()))
{
table.ChildElements[t].Remove();
t--;
goto OUTERCONTINUE;
}
}
OUTERCONTINUE:;
}
}
}
wordDoc.MainDocumentPart.Document.Save();
wordDoc.Close();
}
}
However after execution, the below is what I am getting:
It is obvious that the image is missing, even though I am only removing the necessary rows, the images in the irrelevant rows are also seems to be corrupted/removed. Can someone explain why does this happen and how to solve this?