Every day I receive an email with a word document. All text in the document exists in a table in the document. I have hundreds of these word documents (I get one every day). I want to use python to open each document, copy the text that I need, and paste it into an excel document. However, I am getting stuck on the very first part. I can't pull the text from the word document. I am trying to use python-docx module to pull the text, but I can't figure out how to read the text from the tables.
I modified a getText module in the python intro book I am reading, but it doesn't seem to work. Am I even on the right track here?
import docx
fullText = []
def getText(filename):
doc = docx.Document(filename)
for table in doc.Tables:
for row in table.Rows:
for cell in row.Cells:
fullText.append(cell.text)
return '\n'.join(fullText)
Okay, after looking at this other question I have realized that I am actually having a different problem than I thought. I have made changes and have the following code:
import docx
fullText = []
doc = docx.Document('c:\\btest\\January18.docx')
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
fullText.append(cell.text)
'\n'.join(fullText)
print(fullText)
it is printing out this:
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
The thing is, the tables in the word document are not blank cells, and so they should not be returning blank. What am I doing wrong?
A sample input document is here
I am trying to pull certain text rows out of this document, and pasting and formatting the text the way I want. However, I can't even access the text in the word document...