Questions tagged [docx]

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. Use this tag when you are working with .docx files programmatically, such as generating .docx, extracting data from .docx or editing a .docx

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. This is the Microsoft Office Open XML WordProcessingML format. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML WordProcessingML is mostly standardized in ECMA 376 and ISO 29500.

Formerly, Microsoft used the BIFF (Binary Interchange File Format) binary format (.xls, .doc, .ppt). It now uses the OOXML (Office Open XML) format. These files (.xlsx, .xlsm, .docx, .docm, .pptx, .pptm) are zipped-XML.

.docx is the new default Word format, it cannot contain any VBA (for security reasons as stated by Microsoft).
.docm is the new Word format that can store VBA and execute macros.

The .docx format is a zipped file that contains the following folders:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

The main content of a docx file resides in word/document.xml.

A typical word/document.xml looks like this :

<w:body>
  <w:p w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidRDefault="0059122C" w:rsidP="0059122C">
    <w:r>
      <w:t>Hello </w:t>
    </w:r>
    <w:proofErr w:type="spellStart"/>
    <w:r w:rsidR="008B4316">
      <w:t>W</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd"/>
    <w:r>
      <w:t>orld</w:t>
    </w:r>
    <w:bookmarkStart w:id="0" w:name="_GoBack"/>
    <w:bookmarkEnd w:id="0"/>
  </w:p>
  <w:sectPr w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidSect="001A6335">
    <w:headerReference w:type="default" r:id="rId7"/>
    <w:footerReference w:type="default" r:id="rId8"/>
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>
  </w:sectPr>
</w:body>

The tags are w:body (for the whole document), and then the document is separated in multiple w:p (paragraphs). And a w:sectPr, which defines the headers/footers used for that document.

Inside a w:p, there are multiple w:r (runs). Every run defines its own style (color of the text, font-size, ...), and every run contains multiple w:t (text parts).

As you can see, a simple sentence like Hello World might be separated in multiple w:t, which makes templating quite difficult to implement.

3020 questions

vote

4 answers

Which PHP API or library is the best for converting from HTML to PDF and DOCX?

First, I tried to use Cloudconvert. It can convert between so many fyletypes, but its PHP API causes memory leaks almost at all times. The second I tried was Pdfcrowd. It works perfectly, but it can convert only HTML to PDF. The third I tried was…

asked May 11 '14 at 15:09

aleskva

1,644
2
21
40

vote

1 answer

In need of a clear example on how to get the word count of DOC and DOCX files

I am able to read a DOC file and get its word count, BUT it is wrong. My code: public class WordCounter { public static void main(String[] args) throws Throwable { processDOC(); } private static void processDOC() throws…

java apache apache-poi docx doc

asked May 05 '14 at 18:38

iCodeLikeImDrunk

17,085
35
108
169

vote

3 answers

Adding image with docx4j to doc file

I'm trying to add an image to a docx file using docx4j library within Android. I've faced to an exception: E/AndroidRuntime(21818): java.lang.ExceptionInInitializerError E/AndroidRuntime(21818): at…

android ms-word docx docx4j

asked Apr 27 '14 at 19:15

kirik88

vote

2 answers

Convert DOCX to HTML incliding IMAGES

I am using DOCX4J to convert the DOCX to HTML .I have successfully done the conversion and got the html format.I will be using the html format to embed it as EMAIL body to send an email.But I have some issues which are listed below.... Unable to…

java html docx docx4j

asked Apr 11 '14 at 06:47

user3522392

vote

0 answers

open word document (.docx) without using external app in android

How do I open a .docx file without using external app in android? I have used Apache POI, OliveDocLibrary, Aspose.words libraries. using Apache POI, I have used required libraries, but still there is an exception while running the app (Exception…

android docx

asked Apr 09 '14 at 10:49

parijatha

vote

1 answer

Possible to Insert page in word document with python-docx?

I just read through the documentation on python-docx. They mention several times that added content is created at the end of the document, but I didn't notice any way to alter this functionality. Does anyone know how to add a new page to a…

python ms-word docx python-docx

asked Apr 08 '14 at 23:25

Chockomonkey

3,895
7
38
55

vote

1 answer

Apache POI - read docx with image in header

I'm trying to process docx file with Apache POI. Just simply read and then write file (just for now). Here is my simple code: FileInputStream fileInputStream = new FileInputStream(inputFile); XWPFDocument document = new…

java apache-poi docx

asked Apr 08 '14 at 11:29

Maciek Murawski

vote

1 answer

how to insert an element using docx4j

I have a .docx document with some tables at the top. These contain text placeholders that need replaced, which works fine. However, one of these tables needs to be repeated and filled with different values. I am able to deep copy the table and add…

java docx docx4j

asked Mar 27 '14 at 20:00

user3170702

1,971
8
25
33

vote

1 answer

Can you remove a text line in PHP with OpenTBS if there is nothing merged

I have been using OpenTBS for modifying DOCX files. When I am merging my DB info with the DOCX file it always leave blank data for any info in my DB that is missing. Here is an image of my 'source' document on the left and my merge on the right.…

php yii docx opentbs

asked Mar 07 '14 at 22:22

CGSmith105

vote

0 answers

Compiling docx with py2exe

I'm compiling a python script to .exe via py2exe. I originally started compiling it and the entire program ran fine aside from the Word document creation. My logfile would give me: ERROR: Could not close or save Word Document 'docName.docx' :, so…

python py2exe docx

asked Mar 06 '14 at 23:18

signus

1,118
14
43

vote

2 answers

Force docx file download via PHP corrupt

I know there are a lot of mentions of this but I have tried all the suggestions and nothing seems to work. I have this script to force download files, but when using docx formats it downloads ok but then says the file is corrupt. However word does…

php download docx

asked Feb 26 '14 at 08:30

user1572993

vote

0 answers

How to remove specific pages in a word docx?

I would like to delete/remove specific pages in a word docx with Open XML SDK 2.0. I have tried using the code from here: blogs.msdn.com/b/brian_jones/archive/2009/06/15/removing-page-and-section-breaks-from-a-word-document.aspx It only deletes all…

asp.net openxml docx openxml-sdk

asked Feb 14 '14 at 03:50

Qwerty

vote

0 answers

Where to find/how to extract image hyperlinks from zipped docx document?

If you zip a docx file, open it with winrar and click on view you can see the zip folder structure of the document. In word/_rels/document.xml.rels you can find the text hyperlinks but unfortunately not the links the images show to. Is there a way…

xml hyperlink zip extract docx

asked Feb 05 '14 at 10:22

user2718671

2,866
9
49
86

vote

1 answer

change height of table row with C# Open XML and Word docx documents

I have a table in a docx file and i want to proccess it and change the height of a row.Here is my code so far WordprocessingDocument wordDoc = WordprocessingDocument.Open("path_to_file", true) ; Table table =…

c# ms-word openxml docx

asked Jan 27 '14 at 19:03

TAL

vote

1 answer

Function to get the content of a docx in php

private function read_docx($filename) { var_dump($filename); $striped_content = ''; $content = ''; $zip = zip_open($filename); if (!$zip || is_numeric($zip)) return false; while ($zip_entry = zip_read($zip)) { …

php zip docx

asked Jan 26 '14 at 10:11

Mohammed Sufian

1,743
6
35
62

Prev 1 2 3

…

99 100 Next