I am using the win32com Python module to target a Word .doc table and then extract all Sentences/ListParagraphs from it.
I am able to successfully get the all my content using doc.Paragraphs
. I then try to run..
EDITED:
doc = word.Documents.Open(path)
list = doc.Paragraphs
for x in list:
if str(x.Style) == "Normal" and x != "":
# do stuff
this does not detect empty/whitespaced Lists and Paragraphs. I also tried using
x.isspace()
to check for white space but it always returned False
.
I have had a run in with \r\n\t\x07\x0b
characters before, which seem to be extracted in COM class objects. They cause all sorts of weird issues when converting them to strings. Could it be something similar?
Thanks