-1

I'm struggling to find the heading name in which a table lies, I'm using python-docx library, I'd like to know the possibility I can use to get the table along its heading name in which it lies inside.

from docx import Document
from docx.shared import Inches
document = Document('test.docx')

tabs = document.tables
Ali Asad
  • 1,235
  • 1
  • 18
  • 33
  • Can you give a sample example how your file looks and what you are expecting? – yabhishek Oct 06 '19 at 06:13
  • sure @yabhishek it looks alike a typical word document. Whereby we have headings, inside it are some paragraphs or maybe subheadings. And a table could be either in heading 1 or it could be inside a subheading 1.1 .please let me know if the description is enough to understand or I should explain more. Thank you :) – Ali Asad Oct 06 '19 at 06:16
  • If you search (Google) on "python-docx iter_block_items" I think you'll find the insights you're looking for. This has to do with paragraphs and tables being gathered separately by default (for at least some good reasons). – scanny Oct 06 '19 at 19:33

1 Answers1

2

You can extract the structured information from docx file using the xml. Try this:

doc = Document("file.docx")
headings = [] #extract only headings from your code
tables = [] #extract tables from your code
tags = []
all_text = []
schema = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
for elem in doc.element.getiterator():
    if elem.tag == schema + 'body':
        for i, child in enumerate(elem.getchildren()):
            if child.tag != schema + 'tbl':
                 node_text = child.text
                 if node_text:
                     if node_text in headings:
                         tags.append('heading')
                     else:
                         tags.append('text')
                     all_text.append(node_text)
             else:
                 tags.append('table')
        break

After the above code you will have the list of tags that will show the structure of document heading, text and table then you can map the respective data from the lists.

Also check the data from tag list to get heading of a table.

abdulsaboor
  • 678
  • 5
  • 10