0

I have a simple document with one table in it. I would like to read its cells content. I found many tutorials for writing, but none for reading.

I suppose I should enumerate sections, but how to know which contains a table?

var document = DocX.Create(@"mydoc.docx");

var s = document.GetSections();
foreach (var item in s)
{

}
Emaro
  • 1,397
  • 1
  • 12
  • 21
vico
  • 17,051
  • 45
  • 159
  • 315

2 Answers2

1

I'm using the following namespace aliases:

using excel = Microsoft.Office.Interop.Excel;
using word = Microsoft.Office.Interop.Word; 

You can specifically grab the tables using this code:

        private void WordRunButton_Click(object sender, EventArgs e)
        {

            var excelApp = new excel.Application();
            excel.Workbooks workbooks = excelApp.Workbooks;
            var wordApp = new word.Application();
            word.Documents documents = wordApp.Documents;
            wordApp.Visible = false; 
            excelApp.Visible = false;
// You don't want your computer to actually load each one visibly; would ruin performance.

            string[] fileDirectories = Directory.GetFiles("Some Directory", "*.doc*",
                   SearchOption.AllDirectories);

            foreach (var item in fileDirectories)
            {
                word._Document document = documents.Open(item);

                foreach (word.Table table in document.Tables)
                {
                        string wordFile = item;
                        appendName = Path.GetFileNameWithoutExtension(wordFile) + " Table " + tableCount + ".xlsx"; 
                       //Not needed if you're not going to save each table individually

                        var workbook = excelApp.Workbooks.Add(1);
                        excel._Worksheet worksheet = (excel.Worksheet)workbook.Sheets[1];

                        for (int row = 1; row <= table.Rows.Count; row++)
                        {
                            for (int col = 1; col <= table.Columns.Count; col++)
                            {

                                var cell = table.Cell(row, col);
                                var range = cell.Range;
                                var text = range.Text;

                                var cleaned = excelApp.WorksheetFunction.Clean(text);

                                worksheet.Cells[row, col] = cleaned;
                            }
                        }
                        workbook.SaveAs(Path.Combine("Some Directory", Path.GetFileName(appendName)), excel.XlFileFormat.xlWorkbookDefault); 
                        //Last arg can be whatever file extension you want 
                        //just make sure it matches what you set above.

                        workbook.Close();
                        Marshal.ReleaseComObject(workbook);

                    tableCount++;
                }

                document.Close();
                Marshal.ReleaseComObject(document);
            }
//Microsoft apps are picky with memory. Make sure you close and release each instance once you're done with it.
//Failure to do so will result in many lingering apps in the background
            excelApp.Application.Quit();
            workbooks.Close();
            excelApp.Quit();

            Marshal.ReleaseComObject(workbooks);
            Marshal.ReleaseComObject(excelApp);

            wordApp.Application.Quit();
            wordApp.Quit();

            Marshal.ReleaseComObject(documents);
            Marshal.ReleaseComObject(wordApp);
        }

The document is the actual word document type (word.Document). Make sure you check for split cells if you have them!

Hope this helps!

k794
  • 21
  • 7
  • The rows and columns start at 1 for some reason in the Microsoft Interop libraries; that's not something I did to skip the first row and col. – k794 Sep 24 '18 at 18:38
  • And how to declare and open Word.Document ? – vico Sep 25 '18 at 04:42
  • I added in a more complete example. Bear in mind that this will put each table in a word doc into its own excel file. If you want all of the tables in one location, just adjust the code so that it doesn't create it's own directory for each table. – k794 Sep 25 '18 at 12:35
  • Statement wordApp.Quit() is crashing here.. I removed it, your loop keeps working. +1 very nice – Goodies Jul 20 '19 at 20:14
1

If you only have one table in document it should be rather simple. Try this:

DocX doc = DocX.Load("C:\\Temp\\mydoc.docx");
Table t = doc.Table[0];
//read cell content
string someText = t.Rows[0].Cells[0].Paragraps[0].Text;

You can loop through table rows and table cells inside each row, and also through Paragraphs inside each Cells[i] if there are more paragraphs. You can do that with simple for loop:

for (int i = 0; i < t.Rows.Count; i++)
{
someText = t.Rows[i].Cells[0].Paragraphs[0].Text;
}

Hope it helps.

SmolkoMatic
  • 56
  • 1
  • 6