How to get all the text in a nested table using python?

Question

I have to extract all the text in a nested table (tables inside table inside table) from a word document. I'm unable to do it using the python-docx, maybe my lack of knowledge.

Please suggest some code examples.

score 2 · Accepted Answer · answered Oct 13 '20 at 17:38

2

You will want some sort of recursion. The basic idea is:

def iter_paragraphs_of_tables(tables):
    for table in tables:
        for row in table.rows:
            for cell in row.cells:
                yield from cell.paragraphs
                yield from iter_paragraphs_of_tables(cell.tables)

for paragraph in iter_paragraphs_of_tables(document.tables):
    print(paragraph.text)

This is Python3, if you're on Python2 you'll need to expand the yield from statements into, for example:

yield from cell.paragraphs
# --- becomes ---
for paragraph in cell.paragraphs:
    yield paragraph

answered Oct 13 '20 at 17:38

scanny

26,423
5
54
80

Thanks @scanny. I'll try this. I'm using python3. – Rabindra Oct 14 '20 at 04:55
Thank you so much. This worked for me. Had to do some changes, but the idea worked. – Rabindra Oct 14 '20 at 14:27

score 1 · Answer 2 · answered Oct 13 '20 at 12:23

1

python-docx seems more like a write/modify docx library you may want to try PyPDF2 https://pythonhosted.org/PyPDF2/. But the table inside table thing i don't really understand it i guess the table is nested in the word document ? if that's the case just read the read with PyPDF2 and put the words that you want to keep in a table. I wish you the best time reading.

answered Oct 13 '20 at 12:23

Vodkrobaz

26
3

Thanks, will definitely check that. Hopefully, it's not just for pdfs. – Rabindra Oct 13 '20 at 12:29

How to get all the text in a nested table using python?

2 Answers2