0

I am using the win32com Python module to target a Word .doc table and then extract all Sentences/ListParagraphs from it.

I am able to successfully get the all my content using doc.Paragraphs. I then try to run..

EDITED:

doc = word.Documents.Open(path)
list = doc.Paragraphs


for x in list:
if str(x.Style) == "Normal" and x != "":
# do stuff

this does not detect empty/whitespaced Lists and Paragraphs. I also tried using x.isspace() to check for white space but it always returned False.

I have had a run in with \r\n\t\x07\x0b characters before, which seem to be extracted in COM class objects. They cause all sorts of weird issues when converting them to strings. Could it be something similar?

Thanks

The Mob
  • 391
  • 3
  • 10
  • `if str(x.Style) == "Normal" and not "":` is a nonsense line. The first part is okay(-ish -- why does the style name matter?) but the truthy/falsy part `not ""` will just always return `True`, and, due to the `and`, do nothing, positive or negative. Are you forgetting something there? – Jongware Jan 30 '20 at 23:04
  • Ah -- `x.isspace()` always fails because `x` is never a space, it's a *paragraph*. The list you get returned is a list of paragraphs so you want to check `text = x.Range.Text`. – Jongware Jan 30 '20 at 23:08
  • Depending on the ```x.Style``` I perform certain tasks. I.e if its a Bulleted List, do X. If it is has a Normal style, do Y. I could be forgetting something, but the main goal is to check for the style and if the paragraph is empty i.e ```""``` – The Mob Jan 31 '20 at 01:34

0 Answers0