Questions tagged [docx]

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. Use this tag when you are working with .docx files programmatically, such as generating .docx, extracting data from .docx or editing a .docx

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. This is the Microsoft Office Open XML WordProcessingML format. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML WordProcessingML is mostly standardized in ECMA 376 and ISO 29500.

Formerly, Microsoft used the BIFF (Binary Interchange File Format) binary format (.xls, .doc, .ppt). It now uses the OOXML (Office Open XML) format. These files (.xlsx, .xlsm, .docx, .docm, .pptx, .pptm) are zipped-XML.

.docx is the new default Word format, it cannot contain any VBA (for security reasons as stated by Microsoft).
.docm is the new Word format that can store VBA and execute macros.

The .docx format is a zipped file that contains the following folders:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

The main content of a docx file resides in word/document.xml.

A typical word/document.xml looks like this :

<w:body>
  <w:p w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidRDefault="0059122C" w:rsidP="0059122C">
    <w:r>
      <w:t>Hello </w:t>
    </w:r>
    <w:proofErr w:type="spellStart"/>
    <w:r w:rsidR="008B4316">
      <w:t>W</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd"/>
    <w:r>
      <w:t>orld</w:t>
    </w:r>
    <w:bookmarkStart w:id="0" w:name="_GoBack"/>
    <w:bookmarkEnd w:id="0"/>
  </w:p>
  <w:sectPr w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidSect="001A6335">
    <w:headerReference w:type="default" r:id="rId7"/>
    <w:footerReference w:type="default" r:id="rId8"/>
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>
  </w:sectPr>
</w:body>

The tags are w:body (for the whole document), and then the document is separated in multiple w:p (paragraphs). And a w:sectPr, which defines the headers/footers used for that document.

Inside a w:p, there are multiple w:r (runs). Every run defines its own style (color of the text, font-size, ...), and every run contains multiple w:t (text parts).

As you can see, a simple sentence like Hello World might be separated in multiple w:t, which makes templating quite difficult to implement.

3020 questions

vote

1 answer

Apache POI Word .DOC Replacing Text

I would like to open a .doc file search for some text and replace it with other text. I know of the RANGE.replaceText(placeholder, newString) method but it is unreliable when you have mergfields, or other special formatting in the document and can…

asked Jun 23 '13 at 15:11

user2020457

vote

3 answers

PDF compression How does Adobe do it?

This is a bit more of a fun question than a serious one, but how does the Adobe PDF format make documents so... portable? I just created a small Word document, 235kb in size, containing multiple color photos and a few textual phrases. A PDF…

pdf filesize docx

asked Nov 11 '09 at 22:16

NickSentowski

vote

1 answer

Java DOCX file Viewer

Currently I'm developing an application that allows users to create a template and generate it into a DOCX file. The application needs to be able to display to users the changes in the template as the user is creating it. The approach I tried was…

java docx docx4j icepdf

asked Jun 17 '13 at 20:35

Leandro Santos

vote

1 answer

POI docx paragraph outline parsing

I have a very simple issue that is driving me crazy. Basically I want to extract, via POI/DOCX4J libraries, docx paragraph structure and document outline. I did the same task with a normal doc document using the POI paragraph.getLvl() method. Is…

java ms-word apache-poi docx docx4j

asked May 28 '13 at 09:49

YoBre

2,520
5
27
37

vote

2 answers

How generate docx/odt file with math formulas from java

Good day. I must generete docx or odt file with many math formulas inside. i try to find solution in Apashe POI & ODFtoolkit but i am not was able. google doesn't help. ( May be anybody can help me with solution in this task? (any example?) Thanks.

java math docx formulas odt

asked May 28 '13 at 06:04

Aleksandr Yudin

vote

2 answers

PHP xPath docx parsing

I am trying to open up a Word 2007 document (docx), I unzip it successively but I am having an issue with the xPath portion of the code. I want to iterate each element and grab the text within the element. In the current example below I am trying…

php xml xpath ms-word docx

asked May 03 '13 at 16:49

Anderson

vote

0 answers

Phpdocx word documents are corrupt when adding images

I'm using Phpdocx 2.5 to convert html to docx. I'm using the embedHTML method with 'downloadImages' parameter set to true; When the html doesn't contain any images, document is generated just fine. When images are added, the resulting document…

php docx phpdocx

asked May 02 '13 at 15:14

Biggie Mac

1,307
2
13
26

vote

1 answer

How to add item transform to VS2012 .proj msbuild file

Based off this answer describing an item transform to convert image files from jpg to png, I made an item transform that converts .docx file to .pdf. When I call it from my projectname.proj build file I get this error message: Error 1 The…

pdf msbuild docx

asked Apr 24 '13 at 20:58

Pauli Price

4,187
3
34
62

vote

1 answer

Export from Java EE + Struts2 to DOC files

Someone knows any java library that allows me to export information to doc format, I appreciate variety. My project is using Java EE and STRUTS2. So I need to evaluate and to compare the options. For example JASPERREPORTS.

jakarta-ee struts2 jasper-reports docx doc

asked Apr 24 '13 at 12:02

villanueva.ricardo

vote

1 answer

OpenTBS Multiple pages of repeated template containing table

Alright, I'm new to XML and OpenTBS so this concept of blocks etc is very confusing for me, and when I thought I had the gist of it, my client asked for even more of me. I've got a table of customers and their items, the client wants one single…

php docx opentbs

asked Apr 10 '13 at 21:47

PwnageAtPwn

vote

1 answer

Get Xml Text node ID

I'm trying to parse through the document.xml file of a .docx file. I would like to search for Text and then return the node that text is located so I can then move up to the parent node and insert a new node type. This is what I have so far, I have…

c# xml xml-parsing docx xmlnode

asked Apr 09 '13 at 14:57

user1704863

vote

0 answers

Python sends corrupt .docx as email attachment (google app engine)

I want to send an email from python with: thedoc = generate_doc() mail.send_mail(sender="Support", to="user@mail.co.uk", subject="RE: ref", attachments=('thedoc.docx', thedoc), body="""Blah…

python google-app-engine email docx zip

asked Apr 02 '13 at 13:27

Awalias

2,027
6
31
51

vote

0 answers

Does docx4j convert xhtml to docx in memory?

I'm trying to convert xhtml file to docx and find following example code: wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert(new File(inputfilepath), null, wordMLPackage) ); wordMLPackage.save(new…

xhtml docx docx4j

asked Mar 27 '13 at 12:42

simpletosimple

vote

2 answers

.net program to parse .doc file

I want to create an application which will be able to parse doc/docx files structure of this file is shown bellow: par-000.01 - some content par-000.21 - some content par-000.31 - some content par-001.32 - some content content could be multi line…

c# .net parsing docx doc

asked Mar 12 '13 at 18:23

Mithrand1r

2,313
9
37
76

vote

0 answers

How to convert DocX document to Microsoft.Office.Interop.Word.Document?

I want to convert or typecaste an existing DocX word doument to Microsoft.Office.Interop.Word.Document. static DocX g_document; .... .... function DoSomething() { g_document = DocX.Load(@"C:\Users\RetailWrite.docx"); …

c# c#-4.0 ms-word docx

asked Feb 27 '13 at 08:38

Newton Sheikh

1,376
2
19
42

Prev 1 2 3

…

100 Next