1

Is there any tool to extract all tables from a word documents and converting them to a csv file or any excel extension file using python or vba

note that the word file contains both text and tables.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Najim BOUHAFA
  • 13
  • 1
  • 3
  • Does this answer your question? [python -docx to extract table from word docx](https://stackoverflow.com/questions/46618718/python-docx-to-extract-table-from-word-docx) – Kraay89 Feb 24 '21 at 12:27
  • Does this answer your question? [Write tables from Word (.docx) to Excel (.xlsx) using xlsxwriter](https://stackoverflow.com/questions/62043218/write-tables-from-word-docx-to-excel-xlsx-using-xlsxwriter) – Tomerikoo Apr 23 '21 at 10:21
  • [Python - Convert tables from .doc / .docx-files to .xls](https://stackoverflow.com/q/17591195/6045800) – Tomerikoo Apr 23 '21 at 10:21

1 Answers1

3

You can use pandas with python-docx. Per this answer you can extract all tables from a document and put them in a list:

from docx import Document
import pandas as pd
document = Document('test.docx')

tables = []
for table in document.tables:
    df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
    for i, row in enumerate(table.rows):
        for j, cell in enumerate(row.cells):
            if cell.text:
                df[i][j] = cell.text
    tables.append(pd.DataFrame(df))

You can then save the tables to csv files by looping through the list:

for nr, i in enumerate(tables):
    i.to_csv("table_" + str(nr) + ".csv")
RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26