31

I'm working on creating a Python generated report that uses Pandas DataFrames. Currently I am using the DataFrame.to_string() method. However this writes to the file as a string. Is there a way for me to achieve this while keeping it as a table so I can use table formating.

Code:

SEMorgkeys = client.domain_organic(url, database = "us", display_limit = 10, export_columns=["Ph,Pp,Pd,Nq,Cp,Ur,Tr"])
org_df = pd.DataFrame(SEMorgkeys)

f = open(name, 'w')
f.write("\nOrganic:\n")
f.write(org_df.to_string(index=False,justify="left"))
f.close()

Current Printout (as string):

CPC    Keyword                        Position Difference Previous Position Search Volume Traffic (%) Url                                               
75.92       small business factoring   0                   1                 210          11.69       https://www..com/small-business-f...
80.19              factoring company   0                   8                1600           5.72       https://www..com/factoring-vs-ban...
spriore
  • 603
  • 3
  • 10
  • 17
  • It may be easier to write the data to a .csv and then copy/paste or import the table from Excel to Word – Benjamin James Nov 14 '16 at 19:41
  • For a single table yes I would agree. However, I'm looping through about a dozen URL's with about 6 DataFrames per loop. I'd really prefer not to have to create a .csv for 72 tables. – spriore Nov 14 '16 at 19:48
  • Could you add some additional information. Are you trying to write the dataframe as a formatted table in MSWord or just add the the lines of text in as formatted using the `.to_string` method? – James Nov 14 '16 at 20:02
  • I would like to write the DataFrame as a table into Word. Then I intend to use table formatting in Word. – spriore Nov 14 '16 at 20:06

4 Answers4

51

You can write the table straight into a .docx file using the python-docx library.

If you are using the Conda or installed Python using Anaconda, you can run the command from the command line:

conda install python-docx --channel conda-forge

Or to pip install from the command line:

pip install python-docx

After that is installed, we can use it to open the file, add a table, and then populate the table's cell text with the data frame data.

import docx
import pandas as pd

# i am not sure how you are getting your data, but you said it is a
# pandas data frame
df = pd.DataFrame(data)

# open an existing document
doc = docx.Document('./test.docx')

# add a table to the end and create a reference variable
# extra row is so we can add the header row
t = doc.add_table(df.shape[0]+1, df.shape[1])

# add the header rows.
for j in range(df.shape[-1]):
    t.cell(0,j).text = df.columns[j]

# add the rest of the data frame
for i in range(df.shape[0]):
    for j in range(df.shape[-1]):
        t.cell(i+1,j).text = str(df.values[i,j])

# save the doc
doc.save('./test.docx')
James
  • 32,991
  • 4
  • 47
  • 70
  • what is `data` in `df = pd.DataFrame(data) ` – Pyd Dec 26 '17 at 10:27
  • 1
    @pyd `data` is the data source (what ever your input is) for your `DataFrame` – spriore Jan 09 '18 at 19:59
  • 2
    Is there a way to add borders around the table? The code works but I think my report would look better with borders around my Pandas Dataframe that is written to my word document. Thanks! :) – bbartling Jan 19 '18 at 14:21
  • 4
    @HenryHub Set the table's [style](http://python-docx.readthedocs.io/en/latest/user/styles-using.html), e.g., `t.style = 'Table Grid'` – David C May 24 '18 at 16:38
  • 1
    In my opinion, an extension to docx.Document class such as `.add_df_as_table()` would be quite useful. – precise May 30 '18 at 09:00
  • I installed docx with Anaconda 2 and 3 but I get the error "No module named docx" – user1581390 Jun 25 '18 at 01:18
  • 3
    In case you are creating a new document, use `doc = docx.Document()` instead of `doc = docx.Document('./test.docx')`. Otherwise you will get a PackageNotFoundError. – n1000 Sep 01 '19 at 19:59
  • Is there a way to make this work for google docs? I have tried copying from a .docx document and pasting to a google docs, but it does copy the table layout. – Gabriel Ziegler Nov 24 '20 at 00:30
  • @GabrielZiegler, check out the Google Docs API https://developers.google.com/docs/api/quickstart/python – James Nov 24 '20 at 11:46
  • Is there a way to add the table to a specific position between two paragraphs? – mouwsy Jan 15 '21 at 11:40
  • 1
    @James Its not working for multi index columns names since we get NoneType object in empty cells I guess. Can you please verify? – JaySabir Feb 11 '21 at 20:59
  • @JaySabir, that would be a really good new question to post on SO. – James Feb 11 '21 at 21:44
5
def doctable(data, tabletitle, pathfile):
    from docx import Document
    from docx.shared import Pt, Mm
    import pandas as pd
    document = Document()
    section = document.sections[0]
    section.page_height = Mm(297)
    section.page_width = Mm(210)
    section.left_margin = Mm(20)
    section.right_margin = Mm(20)
    section.top_margin = Mm(20)
    section.bottom_margin = Mm(20)
    section.header_distance = Mm(12.7)
    section.footer_distance = Mm(12.7)
    data = pd.DataFrame(data) # My input data is in the 2D list form
    document.add_heading(tabletitle)
    table = document.add_table(rows=(data.shape[0]), cols=data.shape[1]) # First row are table headers!
    table.allow_autofit = True
    table.autofit = True
    for i, column in enumerate(data) :
        for row in range(data.shape[0]) :
            table.cell(row, i).text = str(data[column][row])
    document.save(pathfile)
    return 0
Tedo Vrbanec
  • 519
  • 6
  • 12
1

use this and prove it if you like:

from docx import Document
import pandas as pd

def df_to_word(data: dict, report_name:str) -> docx.Document:
    assert type(data) == dict, 'data has to be dict'
    assert '.docx' in report_name, 'report_name has to be a .docx file'
    df = pd.DataFrame(data)
    doc = docx.Document()

    table = doc.add_table(df.shape[0]+1, df.shape[1])

    for j in range(df.shape[-1]):
        table.cell(0,j).text = df.columns[j]

    for i in range(df.shape[0]):
        for j in range(df.shape[-1]):
            table.cell(i+1,j).text = str(df.values[i,j])

    doc.save(f'./{report_name}')


data = {
  "calorierbes": [420, 380, 390],
  "duratierbn": [50, 40, 45],
  "durationverg": [50, 40, 45],
  "duratiorgern": [50, 40, 45],
  "calorieers": [420, 380, 390],
  "calorierbers": [420, 380, 390],
  "calorierbes": [420, 380, 390]
}
df_to_word(data, 'report_4.docx')
1

Inspired by the answers above, I have added a function with the ability to include the index.

import docx
import pandas as pd
from pathlib import Path

def pd_table_to_word(df, save_to_path, include_index=False):
    if Path(save_to_path).exists():
        response = input("Document already exists and will be overwritten. Sure you want to overwrite this documents? Y/ N")
        if response.lower() not in ["y", "ye", "yes", "yeah"]:
            return "Aborted overwriting file."
    doc = docx.Document()
    # add a table to the end and create a reference variable
    # extra row is so we can add the header row
    
    n_rows, n_cols = df.shape[0], df.shape[1] +1
    if include_index:
        n_rows += 1
        
    t = doc.add_table(n_rows, n_cols)
    
    # add the header rows.
    for j in range(df.shape[-1]):
        if include_index:
            t.cell(0,j+1).text = df.columns[j]
        else:
            t.cell(0,j).text = df.columns[j]
    
    # add index names
    if include_index:
        t.cell(0, 0).text = df_docx.index.name
        for i in range(df_docx.shape[0]):
            t.cell(i+1, 0).text = df_docx.index[i]

    # add the rest of the data frame
    for i in range(df.shape[0]):
        for j in range(df.shape[-1]):
            if include_index:
                t.cell(i+1, j+1).text = str(df.values[i,j])
            else:
                t.cell(i+1, j).text = str(df.values[i,j])
    doc.save(save_to_path)
    return f"Table saved to {save_to_path}"
Joery
  • 346
  • 1
  • 5
  • Very helpful, thanks! Something I noticed is the "df_docx" wasn't defined. Also, for some reason, I needed to cast the index as a string, or I got the error: "numpy.int64" object is not iterable. – LobstaBoy Dec 05 '22 at 02:06