python-docx: Parse a table to Panda Dataframe

Question

I'm using the python-docx library to extract ms word document. I'm able to get all the tables from the word document by using the same library. However, I'd like to parse the table into a panda data frame, is there any built-in functionality I can use to parse the table into data frame or I'll have to do it manually? Also, is there a possibility to know the heading name in which the table lies inside? Thank you

from docx import Document
from docx.shared import Inches
document = Document('test.docx')

tabs = document.tables

score 15 · Accepted Answer · edited May 05 '20 at 05:12

15

You can extract tables from the document in data-frame by using this code :

from docx import Document
import pandas as pd
document = Document('test.docx')

tables = []
for table in document.tables:
    df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
    for i, row in enumerate(table.rows):
        for j, cell in enumerate(row.cells):
            if cell.text:
                df[i][j] = cell.text
    tables.append(pd.DataFrame(df))
print(tables)

You can get all the tables from the tables variable.

edited May 05 '20 at 05:12

matan h

900
1
10
19

answered Oct 11 '19 at 07:16

abdulsaboor

678
5
10

this is by far the best answer I have seen, this beautiful piece of code does what not even camlot or tabular could do. Awesome work!!! – Somesh Gupta Apr 28 '22 at 19:35

score 1 · Answer 2 · edited Oct 08 '20 at 20:05

1

A similar alternative (but I did not test using multiple tables).
This gave me the dataframe format I was looking for:

for table in firstdoc.tables:
    doctbls=[]
    tbllist=[]
    rowlist=[]
    for i, row in enumerate(table.rows):
        for j, cell in enumerate(row.cells):
            rowlist.append(cell.text)
        tbllist.append(rowlist)
        rowlist=[]
    doctbls=doctbls+tbllist

finaltables=pd.DataFrame(doctbls)     
display(finaltables)

edited Oct 08 '20 at 20:05

abdulsaboor

678
5
10

answered Aug 04 '20 at 04:29

AshleyOboe

56
3

@abdulsaboor can you please help on this one https://stackoverflow.com/questions/75098471/extract-a-word-table-from-multiple-docx-files-using-python-docx – sunny babau Jan 12 '23 at 15:12

python-docx: Parse a table to Panda Dataframe

2 Answers2

Linked