Questions tagged [docx]

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. Use this tag when you are working with .docx files programmatically, such as generating .docx, extracting data from .docx or editing a .docx

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. This is the Microsoft Office Open XML WordProcessingML format. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML WordProcessingML is mostly standardized in ECMA 376 and ISO 29500.

Formerly, Microsoft used the BIFF (Binary Interchange File Format) binary format (.xls, .doc, .ppt). It now uses the OOXML (Office Open XML) format. These files (.xlsx, .xlsm, .docx, .docm, .pptx, .pptm) are zipped-XML.

.docx is the new default Word format, it cannot contain any VBA (for security reasons as stated by Microsoft).
.docm is the new Word format that can store VBA and execute macros.

The .docx format is a zipped file that contains the following folders:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

The main content of a docx file resides in word/document.xml.

A typical word/document.xml looks like this :

<w:body>
  <w:p w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidRDefault="0059122C" w:rsidP="0059122C">
    <w:r>
      <w:t>Hello </w:t>
    </w:r>
    <w:proofErr w:type="spellStart"/>
    <w:r w:rsidR="008B4316">
      <w:t>W</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd"/>
    <w:r>
      <w:t>orld</w:t>
    </w:r>
    <w:bookmarkStart w:id="0" w:name="_GoBack"/>
    <w:bookmarkEnd w:id="0"/>
  </w:p>
  <w:sectPr w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidSect="001A6335">
    <w:headerReference w:type="default" r:id="rId7"/>
    <w:footerReference w:type="default" r:id="rId8"/>
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>
  </w:sectPr>
</w:body>

The tags are w:body (for the whole document), and then the document is separated in multiple w:p (paragraphs). And a w:sectPr, which defines the headers/footers used for that document.

Inside a w:p, there are multiple w:r (runs). Every run defines its own style (color of the text, font-size, ...), and every run contains multiple w:t (text parts).

As you can see, a simple sentence like Hello World might be separated in multiple w:t, which makes templating quite difficult to implement.

3020 questions
1
vote
4 answers

Which PHP API or library is the best for converting from HTML to PDF and DOCX?

First, I tried to use Cloudconvert. It can convert between so many fyletypes, but its PHP API causes memory leaks almost at all times. The second I tried was Pdfcrowd. It works perfectly, but it can convert only HTML to PDF. The third I tried was…
aleskva
  • 1,644
  • 2
  • 21
  • 40
1
vote
1 answer

In need of a clear example on how to get the word count of DOC and DOCX files

I am able to read a DOC file and get its word count, BUT it is wrong. My code: public class WordCounter { public static void main(String[] args) throws Throwable { processDOC(); } private static void processDOC() throws…
iCodeLikeImDrunk
  • 17,085
  • 35
  • 108
  • 169
1
vote
3 answers

Adding image with docx4j to doc file

I'm trying to add an image to a docx file using docx4j library within Android. I've faced to an exception: E/AndroidRuntime(21818): java.lang.ExceptionInInitializerError E/AndroidRuntime(21818): at…
kirik88
  • 65
  • 7
1
vote
2 answers

Convert DOCX to HTML incliding IMAGES

I am using DOCX4J to convert the DOCX to HTML .I have successfully done the conversion and got the html format.I will be using the html format to embed it as EMAIL body to send an email.But I have some issues which are listed below.... Unable to…
user3522392
  • 21
  • 1
  • 2
1
vote
0 answers

open word document (.docx) without using external app in android

How do I open a .docx file without using external app in android? I have used Apache POI, OliveDocLibrary, Aspose.words libraries. using Apache POI, I have used required libraries, but still there is an exception while running the app (Exception…
parijatha
  • 11
  • 2
1
vote
1 answer

Possible to Insert page in word document with python-docx?

I just read through the documentation on python-docx. They mention several times that added content is created at the end of the document, but I didn't notice any way to alter this functionality. Does anyone know how to add a new page to a…
Chockomonkey
  • 3,895
  • 7
  • 38
  • 55
1
vote
1 answer

Apache POI - read docx with image in header

I'm trying to process docx file with Apache POI. Just simply read and then write file (just for now). Here is my simple code: FileInputStream fileInputStream = new FileInputStream(inputFile); XWPFDocument document = new…
Maciek Murawski
  • 414
  • 4
  • 15
1
vote
1 answer

how to insert an element using docx4j

I have a .docx document with some tables at the top. These contain text placeholders that need replaced, which works fine. However, one of these tables needs to be repeated and filled with different values. I am able to deep copy the table and add…
user3170702
  • 1,971
  • 8
  • 25
  • 33
1
vote
1 answer

Can you remove a text line in PHP with OpenTBS if there is nothing merged

I have been using OpenTBS for modifying DOCX files. When I am merging my DB info with the DOCX file it always leave blank data for any info in my DB that is missing. Here is an image of my 'source' document on the left and my merge on the right.…
CGSmith105
  • 490
  • 4
  • 17
1
vote
0 answers

Compiling docx with py2exe

I'm compiling a python script to .exe via py2exe. I originally started compiling it and the entire program ran fine aside from the Word document creation. My logfile would give me: ERROR: Could not close or save Word Document 'docName.docx' :, so…
signus
  • 1,118
  • 14
  • 43
1
vote
2 answers

Force docx file download via PHP corrupt

I know there are a lot of mentions of this but I have tried all the suggestions and nothing seems to work. I have this script to force download files, but when using docx formats it downloads ok but then says the file is corrupt. However word does…
1
vote
0 answers

How to remove specific pages in a word docx?

I would like to delete/remove specific pages in a word docx with Open XML SDK 2.0. I have tried using the code from here: blogs.msdn.com/b/brian_jones/archive/2009/06/15/removing-page-and-section-breaks-from-a-word-document.aspx It only deletes all…
Qwerty
  • 323
  • 1
  • 6
  • 33
1
vote
0 answers

Where to find/how to extract image hyperlinks from zipped docx document?

If you zip a docx file, open it with winrar and click on view you can see the zip folder structure of the document. In word/_rels/document.xml.rels you can find the text hyperlinks but unfortunately not the links the images show to. Is there a way…
user2718671
  • 2,866
  • 9
  • 49
  • 86
1
vote
1 answer

change height of table row with C# Open XML and Word docx documents

I have a table in a docx file and i want to proccess it and change the height of a row.Here is my code so far WordprocessingDocument wordDoc = WordprocessingDocument.Open("path_to_file", true) ; Table table =…
TAL
  • 153
  • 1
  • 3
  • 8
1
vote
1 answer

Function to get the content of a docx in php

private function read_docx($filename) { var_dump($filename); $striped_content = ''; $content = ''; $zip = zip_open($filename); if (!$zip || is_numeric($zip)) return false; while ($zip_entry = zip_read($zip)) { …
Mohammed Sufian
  • 1,743
  • 6
  • 35
  • 62