6

I have a word document (2003). I am using Powershell to parse the content of the document. The document contains a few lines of text at the top, a dozen tables with differing number of columns and then some more text.

I expect to be able to read the document as something like the below:

  1. Read document (make necessary objects etc)
  2. Get each line of text
  3. If not part of a table, process as text and Write-Output
  4. else
  5. If part of a table
  6. Get table number (by order) and parse output based on columns
  7. end if

Below is the powershell script that I have begun to write:

$objWord = New-Object -Com Word.Application
$objWord.Visible = $false
$objDocument = $objWord.Documents.Open($filename)
$paras = $objDocument.Paragraphs
foreach ($para in $paras) 
{ 
    Write-Output $para.Range.Text
}

I am not sure if Paragraphs is what I want. Is there anything more suitable for my purpose? All I am getting now is the entire content of the document. How do I control what I get. Like I want to get a line, be able to determine if it is part of a table or not and take an action based on what number table it is.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Anoop
  • 1,406
  • 2
  • 13
  • 20
  • 1
    Word documents aren't organized in lines. Please take a step back and describe the problem you're trying to solve rather than what you perceive as the solution. – Ansgar Wiechers Oct 28 '12 at 01:15
  • Sure - thanks for responding... So I have this word document that contains some text and about 5 or 6 tables. Each table has a varying number of columns from 2 to 6. The first row in each table describes the header. What I am trying to accomplish is (using Powershell) to read the document, parse the content of the tables and output sql statements that can be run separately against an Oracle database. Now I have many such documents and each of them is similar in structure. But each may have more or less rows in the tables. – Anoop Oct 28 '12 at 02:43

1 Answers1

8

You can enumerate the tables in a Word document via the Tables collection. The Rows and Columns properties will allow you to determine the number of rows/columns in a given table. Individual cells can be accessed via the Cell object.

Example that will print the value of the cell in the last row and last column of each table in the document:

$wd = New-Object -ComObject Word.Application
$wd.Visible = $true
$doc = $wd.Documents.Open($filename)
$doc.Tables | ForEach-Object {
  $_.Cell($_.Rows.Count, $_.Columns.Count).Range.Text
}
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • Thank you very much. But I have one question - how do I know I am inside a table in order to call the table related logic? Is there like a isTable() construct? – Anoop Oct 28 '12 at 15:21
  • Not sure if I understand the question. The `Tables` collection has all tables in the document and nothing else. When you access an object from that collection, that object is a table. – Ansgar Wiechers Oct 28 '12 at 23:42
  • hmm - I think I now understand what you said. I was trying to read all text in a sequential manner, regardless of whether it is a table or not - and if it a table then I wanted to call the table logic. But I need not do it that way. Using the tables collection seems cleaner. Thank you very much. – Anoop Oct 29 '12 at 22:36
  • For the record, to determine if you are in a table, use the boolean "Selection.Information(wdWithInTable)". (6 years late) – dcromley Dec 22 '18 at 18:59