Questions tagged [docx]

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. Use this tag when you are working with .docx files programmatically, such as generating .docx, extracting data from .docx or editing a .docx

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. This is the Microsoft Office Open XML WordProcessingML format. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML WordProcessingML is mostly standardized in ECMA 376 and ISO 29500.

Formerly, Microsoft used the BIFF (Binary Interchange File Format) binary format (.xls, .doc, .ppt). It now uses the OOXML (Office Open XML) format. These files (.xlsx, .xlsm, .docx, .docm, .pptx, .pptm) are zipped-XML.

.docx is the new default Word format, it cannot contain any VBA (for security reasons as stated by Microsoft).
.docm is the new Word format that can store VBA and execute macros.

The .docx format is a zipped file that contains the following folders:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

The main content of a docx file resides in word/document.xml.

A typical word/document.xml looks like this :

<w:body>
  <w:p w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidRDefault="0059122C" w:rsidP="0059122C">
    <w:r>
      <w:t>Hello </w:t>
    </w:r>
    <w:proofErr w:type="spellStart"/>
    <w:r w:rsidR="008B4316">
      <w:t>W</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd"/>
    <w:r>
      <w:t>orld</w:t>
    </w:r>
    <w:bookmarkStart w:id="0" w:name="_GoBack"/>
    <w:bookmarkEnd w:id="0"/>
  </w:p>
  <w:sectPr w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidSect="001A6335">
    <w:headerReference w:type="default" r:id="rId7"/>
    <w:footerReference w:type="default" r:id="rId8"/>
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>
  </w:sectPr>
</w:body>

The tags are w:body (for the whole document), and then the document is separated in multiple w:p (paragraphs). And a w:sectPr, which defines the headers/footers used for that document.

Inside a w:p, there are multiple w:r (runs). Every run defines its own style (color of the text, font-size, ...), and every run contains multiple w:t (text parts).

As you can see, a simple sentence like Hello World might be separated in multiple w:t, which makes templating quite difficult to implement.

3020 questions

votes

2 answers

Is there any way to read .docx file include auto numbering using python-docx

Problem statement: Extract sections from .docx file including autonumbering. I tried python-docx to extract text from .docx file but it excludes the autonumbering. from docx import Document document = Document("wadali.docx") def…

python docx python-docx

asked Aug 30 '18 at 09:59

wadali

2,221
1
20
38

votes

7 answers

Python: Convert PDF to DOC

How to convert a pdf file to docx. Is there a way of doing this using python? I've saw some pages that allow user to upload PDF and returns a DOC file, like PdfToWord Thanks in advance

python bash pdf docx doc

asked Oct 14 '14 at 10:16

AlvaroAV

10,335
12
60
91

votes

5 answers

Merge multiple word documents into one Open Xml

I have around 10 word documents which I generate using open xml and other stuff. Now I would like to create another word document and one by one I would like to join them into this newly created document. I wish to use open xml, any hint would be…

c# merge openxml docx openxml-sdk

asked Aug 21 '13 at 07:55

Incredible

3,495
8
49
77

votes

4 answers

How can I debug a corrupt docx file?

I have an issue where .doc and .pdf files are coming out OK but a .docx file is coming out corrupt. In order to solve that I am trying to debug why the .docx is corrupt. I learned that the docx format is much stricter with regard to extra…

xml debugging docx corrupt

asked Aug 12 '13 at 18:05

Martin Hansen Lennox

2,837
2
23
64

votes

4 answers

Using Vim to edit Microsoft Word files

I've found ViEmu, a vi emulator for microsoft word. However, I wanted to use vim to edit DOC or even rtf files. Is this possible ? Are they any other formats that preserve page/paragraph layout compatible with both Microsoft Word and Vim? I am also…

vim ms-word rtf docx doc

asked Sep 23 '10 at 12:13

Kilon

1,962
3
16
23

votes

5 answers

OpenXML 2 SDK - Word document - Create bulleted list programmatically

Using the OpenXML SDK, 2.0 CTP, I am trying to programmatically create a Word document. In my document I have to insert a bulleted list, an some of the elements of the list must be underlined. How can I do this?

ms-word openxml docx bulletedlist

asked Dec 21 '09 at 15:57

kjv

11,047
34
101
140

votes

4 answers

Apache POI or docx4j for dealing with docx documents

What do you think Which is better to use to read docx document as java objects and why ? in other words. which library supports most of the word tags ?

java apache-poi docx docx4j

asked Feb 21 '13 at 22:57

becks

2,656
8
35
64

votes

4 answers

Figure sizes with pandoc conversion from markdown to docx

I type a report with Rmarkdown in Rstudio. When converting it in html with knitr, there is also a markdown file produced by knitr. I convert this file with pandoc as follows : pandoc -f markdown -t docx input.md -o output.docx The output.docx file…

image markdown docx knitr pandoc

asked Feb 12 '13 at 09:57

Stéphane Laurent

75,186
15
119
225

votes

7 answers

DOCX File type in PHP finfo_file is application/zip

hello I'm trying to validate an uploaded file type by finfo_file function. But when a .docx file is sent, the file type is: application/zip instead of: application/vnd.openxmlformats-officedocument.wordprocessingml.document how can I change this…

php validation docx file-type

asked Jul 06 '11 at 10:49

WooDzu

4,771
6
31
61

votes

9 answers

Convert Html to Docx in c#

i want to convert a html page to docx in c#, how can i do it?

c# docx

asked Mar 25 '11 at 11:04

Luis

2,665
8
44
70

votes

2 answers

PHP Convert Word file to HTML without losing styling and images

Is there an API for converting word files to HTML without losing the format? Can the google documents API be used for this? I tried saaspose but the returning result is always a server error. Solutions that did not work for me: Converting MS Word…

php ms-word docx doc

asked Feb 21 '13 at 07:13

Herr

2,725
3
30
36

votes

5 answers

unoconv not working while trying to convert. throws Error: Unable to connect or start own listener. Aborting

I am trying to convert docx to pdf using unoconv, but getting Error: Unable to connect or start own listener. Aborting. when I run unoconv -f pdf 1234.docx. So, there must be some listener. I then started the listener via unoconv --listener. I tried…

ms-office openoffice.org docx libreoffice file-conversion

asked Feb 13 '12 at 11:48

Krishna Prasad Varma

4,670
5
29
41

votes

4 answers

Version-controlling zipped files (docx, odt)

There are formats that are actually zip files in disguise, e.g. docx or odt. If I store them directly in version control, they are handled as binary files. My ideal solution would be have a hook that creates a foo.docx/ directory for each…

version-control mercurial zip openoffice.org docx

asked Sep 21 '10 at 22:41

Adam Schmideg

10,590
10
53
83

votes

3 answers

How to setup cell borders with python-docx

I need to setup cells borders in table with python-docx, but can't find how to. Please help.

python docx python-docx

asked Oct 11 '15 at 20:17

Valentin

votes

1 answer

Parsing of table from .docx file

I want to parse a table from a .docx file using Python and python-docx into some useful data structure. The .docx file contains only a single table in my case. I've uploaded it so you can have a look. Here's a screenshot:

python xml parsing docx python-docx

asked Jan 09 '15 at 13:32

Sreedhar

Prev 1

…

99 100 Next