-1

I have to extract all the text in a nested table (tables inside table inside table) from a word document. I'm unable to do it using the python-docx, maybe my lack of knowledge.

Please suggest some code examples.

Rabindra
  • 369
  • 4
  • 14

2 Answers2

2

You will want some sort of recursion. The basic idea is:

def iter_paragraphs_of_tables(tables):
    for table in tables:
        for row in table.rows:
            for cell in row.cells:
                yield from cell.paragraphs
                yield from iter_paragraphs_of_tables(cell.tables)

for paragraph in iter_paragraphs_of_tables(document.tables):
    print(paragraph.text)

This is Python3, if you're on Python2 you'll need to expand the yield from statements into, for example:

yield from cell.paragraphs
# --- becomes ---
for paragraph in cell.paragraphs:
    yield paragraph
scanny
  • 26,423
  • 5
  • 54
  • 80
1

python-docx seems more like a write/modify docx library you may want to try PyPDF2 https://pythonhosted.org/PyPDF2/. But the table inside table thing i don't really understand it i guess the table is nested in the word document ? if that's the case just read the read with PyPDF2 and put the words that you want to keep in a table. I wish you the best time reading.

Vodkrobaz
  • 26
  • 3