Questions tagged [python-docx]

A python library to create, read and write Microsoft Office Word 2007 docx files.

The docx module creates, reads and writes Microsoft Office Word 2007 docx files.

##Including the following features:

###Creation:

  • Paragraphs
  • Bullets
  • Numbered lists
  • Document properties (author, company, etc)
  • Multiple levels of headings
  • Tables
  • Section and page breaks
  • Images

###Modification:

  • Search and replace
  • Extract plain text of document
  • Add and delete items anywhere within the document
  • Change document properties
  • Run xpath queries against particular locations in the document - useful for retrieving data from user-completed templates.

For detailed information and examples, visit the python-docx documentation.

Available from pypi.

See also the official GitHub homepage.

1408 questions
7
votes
0 answers

Extract Header and Table text from a .docx file

I'm trying to extract page and header data from a docx file. The file is several hundred pages, each with a table and a header. The header has pertinent information that needs to be paired with each table. I'm able to extract the header and table…
user2682863
  • 3,097
  • 1
  • 24
  • 38
7
votes
3 answers

Update the TOC (table of content) of MS Word .docx documents with Python

I use the python package "python-docx" to modify the structure amd content of MS word .docx documents. The package lacks the possibility to update the TOC (table of content) [Python: Create a "Table Of Contents" with python-docx/lxml. Are there…
thinwybk
  • 4,193
  • 2
  • 40
  • 76
7
votes
1 answer

Finding word on page(s) in document

I am looking for an elegant solution to find on what page(s) in a document a certain word occurs that I have stored in a python dictionary/list. I first considered .docx format as an input and had a look at PythonDocx which has a search function,…
birgit
  • 1,121
  • 2
  • 21
  • 39
7
votes
1 answer

How to find a list in docx using python?

I'm trying to pull apart a word document that looks like this: 1.0 List item 1.1 List item 1.2 List item 2.0 List item It is stored in docx, and I'm using python-docx to try to parse it. Unfortunately, it loses all the numbering at the start. I'm…
SeanVDH
  • 866
  • 1
  • 10
  • 28
7
votes
2 answers

Python docx library text align

I am using python docx library to manipulate a word document. However I can't find how to align a line to the center in the documents page of that library. I can't find by google either. from docx import Document document = Document() p…
Levent Altunöz
  • 273
  • 1
  • 3
  • 10
7
votes
1 answer

Generate the MS word document in django

Currently i am generating the reports in pdf format. But now i want to generate the reports in ms word or docx format. my api.py file def export_pdf(request,id): report = Report.objects.get(id=id) options1 =…
user3541454
  • 139
  • 1
  • 2
  • 9
6
votes
4 answers

extremely slow add a table to python-docx from a csv file

I have to add a table from a CSV file around 1500 rows and 9 columns, (75 pages) in a docx word document. using python-docx. I have tried differents approaches, reading ths csv with pandas or directly openning de csv file, It cost me around 150…
6
votes
2 answers

python-docx: Parse a table to Panda Dataframe

I'm using the python-docx library to extract ms word document. I'm able to get all the tables from the word document by using the same library. However, I'd like to parse the table into a panda data frame, is there any built-in functionality I can…
Ali Asad
  • 1,235
  • 1
  • 18
  • 33
6
votes
1 answer

python-docx how to merge row cells

I am creating a Word document from data using python-docx. I can create all the rows and cells with no problem but, in some cases, when the current record from the database has some content in field comment, I need to add a new line to display a…
HuLu ViCa
  • 5,077
  • 10
  • 43
  • 93
6
votes
1 answer

ImportError: cannot import name Document

When I am running from docx import Document I am getting error as ImportError: cannot import name Document I am working on Python 2.7.
Siddhi Sawant
  • 63
  • 1
  • 4
6
votes
1 answer

Python-docx: Is it possible to add a new run to paragraph in a specific place (not at the end)

I want to set a style to a corrected word in MS Word text. Since it's not possible to change text style inside a run, I want to insert a new run with new style into the existing paragraph... for p in document.paragraphs: for run in p.runs: …
Igor Savinkin
  • 5,669
  • 8
  • 37
  • 69
6
votes
1 answer

Accepting all changes in a MS Word document by using Python

I want to be able to accept all changes from a MS Word (.docx) document from Python, preferably using python-docx module. I know how to do in Perl (see below for reference) but would like to have native code in my Python program to do the same.…
Jean-Francois T.
  • 11,549
  • 7
  • 68
  • 107
6
votes
1 answer

how to generate RGBcolor in RGBColor(0x42, 0x24, 0xE9) format

I am working with python docx and here I am stuck. from docx import Document document = Document() run = document.add_paragraph().add_run() font = run.font from docx.shared import RGBColor font.color.rgb = RGBColor(0x42, 0x24, 0xE9) This generate…
Rahul
  • 10,830
  • 4
  • 53
  • 88
6
votes
1 answer

Python - Remove header and footer from docx file

I need to remove headers and footers in many docx files. I was currently trying using python-docx library, but it doesn't support header and footer in docx document at this time (work in progress). Is there any way to achieve that in Python? As I…
drjackild
  • 473
  • 4
  • 17
6
votes
2 answers

add content to existing docx with python-docx

I'd like to open an existing word document where I already added page numbers and just add some text and headline to it. Here's a basic example of how I tried to accomplish my goal #!/usr/bin/env python from docx import Document document =…
mat1010
  • 756
  • 1
  • 9
  • 17