Questions tagged [docx]

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. Use this tag when you are working with .docx files programmatically, such as generating .docx, extracting data from .docx or editing a .docx

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. This is the Microsoft Office Open XML WordProcessingML format. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML WordProcessingML is mostly standardized in ECMA 376 and ISO 29500.

Formerly, Microsoft used the BIFF (Binary Interchange File Format) binary format (.xls, .doc, .ppt). It now uses the OOXML (Office Open XML) format. These files (.xlsx, .xlsm, .docx, .docm, .pptx, .pptm) are zipped-XML.

.docx is the new default Word format, it cannot contain any VBA (for security reasons as stated by Microsoft).
.docm is the new Word format that can store VBA and execute macros.

The .docx format is a zipped file that contains the following folders:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

The main content of a docx file resides in word/document.xml.

A typical word/document.xml looks like this :

<w:body>
  <w:p w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidRDefault="0059122C" w:rsidP="0059122C">
    <w:r>
      <w:t>Hello </w:t>
    </w:r>
    <w:proofErr w:type="spellStart"/>
    <w:r w:rsidR="008B4316">
      <w:t>W</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd"/>
    <w:r>
      <w:t>orld</w:t>
    </w:r>
    <w:bookmarkStart w:id="0" w:name="_GoBack"/>
    <w:bookmarkEnd w:id="0"/>
  </w:p>
  <w:sectPr w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidSect="001A6335">
    <w:headerReference w:type="default" r:id="rId7"/>
    <w:footerReference w:type="default" r:id="rId8"/>
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>
  </w:sectPr>
</w:body>

The tags are w:body (for the whole document), and then the document is separated in multiple w:p (paragraphs). And a w:sectPr, which defines the headers/footers used for that document.

Inside a w:p, there are multiple w:r (runs). Every run defines its own style (color of the text, font-size, ...), and every run contains multiple w:t (text parts).

As you can see, a simple sentence like Hello World might be separated in multiple w:t, which makes templating quite difficult to implement.

3020 questions
1
vote
1 answer

converting docx document to pdf with docx4j not the same

Good evening! I convert from a docx document programatically (java docx4j) to pdf. I get the pdf document from my docx document but the pdf is not exactly the same as the docx document. (lines between numbers are lost and no bold headline, please…
duracell
  • 644
  • 8
  • 13
1
vote
1 answer

c# Open XML SDK update attached template in DOCX

I am opening existing .docx files from a SharePoint Document Library over the SharePoint web services, and am attempting to attach a new Template to them. The current code for this piece seems to not be doing anything at all. XNamespace w =…
ross_ritchey
  • 41
  • 1
  • 4
1
vote
0 answers

Incorrect parsing of Textbox in docx by OpenXML

I am reading a .docx file using OpenXML in C#. It reads everything correctly but strangely, the content of textbox is being read thrice. What could be wrong? Here is the code to read .docx: public static string TextFromWord(String file) { …
Maxsteel
  • 1,922
  • 4
  • 30
  • 55
1
vote
1 answer

Recognizing colors in text from a docx

I'm trying to write a program that reads a docx file and checks whether some of the text is colored. For instance, imagine if all the words bolded in this sentence were actually written in some arbitrary color. I want my program to recognize that…
user2858182
  • 175
  • 1
  • 7
1
vote
0 answers

Using python docx to create a document and need to modify the paragraph style and save it

I am trying to modify "Text Body" style for paragraph in word 2010 so that the Below paragraph spacing is much less. But when I change that value it will not save it so that when I reopen word the modifications are gone. The reason I want to save…
peztherez
  • 3,751
  • 5
  • 20
  • 21
1
vote
1 answer

Pushing values into Ms Word docx format document with Mustache markup?

I have seen a script that allow mustache styled markup in a docx to be populated from code I cannot find such again. Does anyone know of such a script.
Xdrone
  • 781
  • 1
  • 11
  • 21
1
vote
2 answers

How to add comments to a .docx XML

At work, we have a word document that we have to edit all the time to pass on to another team, to tell them how to perform some tasks. Since I don't like mindlessly filling out data, and I always look for ways to simplify the tasks I have to do, I…
Alex
  • 557
  • 2
  • 8
  • 15
1
vote
1 answer

docx4j html export with list

I have a problem with exporting a docx document to html with docx4j. My application cuts paragraphs out from several documents, then concatenates it into a single one, then exports it into html. The problem is with the lists. The generated docx…
omniflash
  • 191
  • 1
  • 14
1
vote
1 answer

Word Openxml: how to get a text box the right size?

I'm using PHP to generate docx documents from a database. The generated document contains column charts which have labels attached (i.e. user shapes containing textboxes). In an attempt to get the textboxes to accommodate and display all of the text…
munder
  • 137
  • 1
  • 12
1
vote
0 answers

How to add style to an exisiting table in docx using openXML SDK2.0

I am working on a project, which involves with creating data sheet. I have managed to create tables in docx file by following code provided from MSDN, but when I try to apply style (eg. border set to none, shading, cell width, etc.) to the table I…
Owen Wang
  • 31
  • 6
1
vote
2 answers

Print a docx file with printer dialog using C#

I have a docx document, which I want to print from code behind in C#. I had gone through forums and few say, its not possible, i will have to use JavaScript. How to specify file in JavaScript, print code? So far I have done in code behind direct…
Incredible
  • 3,495
  • 8
  • 49
  • 77
1
vote
1 answer

Exact same file and code. So why does the binary of my docx file always end differently?

We take a (non-corrupted) .docx file from our server and post it via httprequest to an API. When downloading it from the API it comes out corrupted. I 99% sure that this is down to the code that posts the file, not the API. It turns out the…
Martin Hansen Lennox
  • 2,837
  • 2
  • 23
  • 64
1
vote
1 answer

Inserting the degree symbol into the word document using python

I am using the python https://github.com/mikemaccana/python-docx module and I am trying to just simply add the degree symbol to my word document and can not see how to do it. Just have a string something like this: Degree = "some_numberº" and then…
peztherez
  • 3,751
  • 5
  • 20
  • 21
1
vote
2 answers

Clear new lines in docx

I've a docx file, this contains a lot of new lines between sections, I need to clear a new line when it appears on more than one occasion consecutively. I unzip the file using: z = zipfile.ZipFile('File.docx','a') z.extractall() Inside of a…
Marco Herrarte
  • 1,540
  • 5
  • 21
  • 42
1
vote
1 answer

.docx problems with SharePoint Designer workflow

So I have a document library with date, alert and alert-date fields. The date and alert fields are completed when a doc is uploaded, and there is a workflow which takes the alert away from the date (and also takes an extra day off) and sets it as…