Questions tagged [docx]

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. Use this tag when you are working with .docx files programmatically, such as generating .docx, extracting data from .docx or editing a .docx

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. This is the Microsoft Office Open XML WordProcessingML format. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML WordProcessingML is mostly standardized in ECMA 376 and ISO 29500.

Formerly, Microsoft used the BIFF (Binary Interchange File Format) binary format (.xls, .doc, .ppt). It now uses the OOXML (Office Open XML) format. These files (.xlsx, .xlsm, .docx, .docm, .pptx, .pptm) are zipped-XML.

.docx is the new default Word format, it cannot contain any VBA (for security reasons as stated by Microsoft).
.docm is the new Word format that can store VBA and execute macros.

The .docx format is a zipped file that contains the following folders:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

The main content of a docx file resides in word/document.xml.

A typical word/document.xml looks like this :

<w:body>
  <w:p w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidRDefault="0059122C" w:rsidP="0059122C">
    <w:r>
      <w:t>Hello </w:t>
    </w:r>
    <w:proofErr w:type="spellStart"/>
    <w:r w:rsidR="008B4316">
      <w:t>W</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd"/>
    <w:r>
      <w:t>orld</w:t>
    </w:r>
    <w:bookmarkStart w:id="0" w:name="_GoBack"/>
    <w:bookmarkEnd w:id="0"/>
  </w:p>
  <w:sectPr w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidSect="001A6335">
    <w:headerReference w:type="default" r:id="rId7"/>
    <w:footerReference w:type="default" r:id="rId8"/>
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>
  </w:sectPr>
</w:body>

The tags are w:body (for the whole document), and then the document is separated in multiple w:p (paragraphs). And a w:sectPr, which defines the headers/footers used for that document.

Inside a w:p, there are multiple w:r (runs). Every run defines its own style (color of the text, font-size, ...), and every run contains multiple w:t (text parts).

As you can see, a simple sentence like Hello World might be separated in multiple w:t, which makes templating quite difficult to implement.

3020 questions
1
vote
2 answers

Export Microsoft word xml file into docx

I am trying to create a Microsoft word document without using any 3rd party libraries. What I am trying to do is : Create a template document in Microsoft Word Save it as an XML File Read this XML file and populate the data in PHP I am able to do…
Kiran
  • 8,034
  • 36
  • 110
  • 176
1
vote
1 answer

JodConverter with LibreOffice outputs all letteres as squares after docx-to-pdf conversion

In order to convert docx-files to pdf (or pdf-a to be precise), we are using JodConverter along with LibreOffice. This has been working fine for a week or so, but then suddenly all letters were representet as squares (usually indicating some…
Tobb
  • 11,850
  • 6
  • 52
  • 77
1
vote
0 answers

Programmatically inserting checkboxes in Excel

I have a question about manipulating Excel environment within C#. I am a beginner programmer and I never programmed in a different environment than C#, therefore, I'd kindly request to avoid VBA hints. My program has a function that reads out the…
TMB
  • 11
  • 3
1
vote
3 answers

Saving Outlook Message Using "docx" Format With C#

I use this code to save my mail message as a .doc file using interop : mailItem.SaveAs(newFileName, Microsoft.Office.Interop.Outlook.OlSaveAsType.olDoc); Now I have to save it as .docx but there is no OlSaveAsType.olDocx so how can I do this?
JD11
  • 306
  • 3
  • 11
1
vote
1 answer

How can I use the DocX library to change the font globally, remove superfluous spaces, and remove or add extra line breaks?

I want to, using the DocX library [https://docx.codeplex.com/], convert a .docx document to use a different font. Does anybody know how to do that? The samples projects are very spare, and the documentation is nonexistent. I find, too, that often…
B. Clay Shannon-B. Crow Raven
  • 8,547
  • 144
  • 472
  • 862
1
vote
2 answers

Problem opening Office 07 documents in SharePoint 07 library with read-only permissions

The call center managers for my company use document libraries in a SharePoint 2007 site to post training material and information to our phone reps. These reps are given read-only access to the libraries as to not change the documents posted by…
Brent
  • 121
  • 3
  • 16
1
vote
1 answer

How can i format the NumPr when i read

when i use docx4j read a docx file i want get the number of a list just like : something other i can got the text "something" but i can't got the "1." P p = (P) o; PPr ppr = p.getPPr(); NumPr npr = ppr.getNumPr(); if(npr!=null){ //how to…
Zane
  • 13
  • 2
1
vote
1 answer

Light weight way convert Docx to string Using c#

I writing a search program to find strings in HUGE collections of documents I don't need to edit or view I just need to grab text as a string out of a word Doc (docx) what is easiest / lightest weight option out there? What I would like to do is…
Crash893
  • 11,428
  • 21
  • 88
  • 123
1
vote
2 answers

Regular Expression Matcher

I am using pattern matching to match file extension with my expression String for which code is as follows:- public static enum FileExtensionPattern { WORDDOC_PATTERN( "([^\\s]+(\\.(?i)(txt|docx|doc))$)" ), PDF_PATTERN( …
Manish
  • 1,274
  • 3
  • 22
  • 59
1
vote
1 answer

Convert DOCX to XML file

I need to use docx and xml files for translation prozess, not all of the translation tools can read xml, but docx, thats why i, xml because it can be better assigned to each other, i want to convert plain text from docx to xml and backwards(from xml…
user2994149
  • 37
  • 1
  • 11
1
vote
2 answers

How can I use a .dot Template on .docx generation in PHP

I'm currently writing a docx generator in PHP. It creates tables, images, paragraphs e.g. and saves everything in the correct structure to a .zip (.docx). Now i need to include some macros into that .docx. I have the macros in a .dot Template on a…
user3025786
1
vote
3 answers

python setup.py install syntax error on Windows

This is probably a dumb question but I am having trouble installing a module from a tar.gz file on Windows. The module is docx. Of course for docx one needs lxml and PIL which I had no problems installing because there are binaries available. For…
griffsterb
  • 131
  • 4
  • 12
1
vote
0 answers

Editing docx in python

I want to modify .docx document by finding some variable in it and replace it to another text. Variables are in specific format: . The problem is how my variables look in Word's .xml files: <$
Djent
  • 2,877
  • 10
  • 41
  • 66
1
vote
0 answers

how to get first page of docx file with POI apache?

I'm creating a application with java and poi apache. I want to use only the first page of docx file. how to get only the first page of docx file with POI apache?
1
vote
1 answer

Adding Table In Cell of Another Table in docx file using POI

I am using POI to generate a docx file. I need to create table in which a cell of table contains another table. How this can be done?
W A K A L E Y
  • 817
  • 1
  • 10
  • 14