Questions tagged [docx]

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. Use this tag when you are working with .docx files programmatically, such as generating .docx, extracting data from .docx or editing a .docx

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. This is the Microsoft Office Open XML WordProcessingML format. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML WordProcessingML is mostly standardized in ECMA 376 and ISO 29500.

Formerly, Microsoft used the BIFF (Binary Interchange File Format) binary format (.xls, .doc, .ppt). It now uses the OOXML (Office Open XML) format. These files (.xlsx, .xlsm, .docx, .docm, .pptx, .pptm) are zipped-XML.

.docx is the new default Word format, it cannot contain any VBA (for security reasons as stated by Microsoft).
.docm is the new Word format that can store VBA and execute macros.

The .docx format is a zipped file that contains the following folders:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

The main content of a docx file resides in word/document.xml.

A typical word/document.xml looks like this :

<w:body>
  <w:p w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidRDefault="0059122C" w:rsidP="0059122C">
    <w:r>
      <w:t>Hello </w:t>
    </w:r>
    <w:proofErr w:type="spellStart"/>
    <w:r w:rsidR="008B4316">
      <w:t>W</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd"/>
    <w:r>
      <w:t>orld</w:t>
    </w:r>
    <w:bookmarkStart w:id="0" w:name="_GoBack"/>
    <w:bookmarkEnd w:id="0"/>
  </w:p>
  <w:sectPr w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidSect="001A6335">
    <w:headerReference w:type="default" r:id="rId7"/>
    <w:footerReference w:type="default" r:id="rId8"/>
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>
  </w:sectPr>
</w:body>

The tags are w:body (for the whole document), and then the document is separated in multiple w:p (paragraphs). And a w:sectPr, which defines the headers/footers used for that document.

Inside a w:p, there are multiple w:r (runs). Every run defines its own style (color of the text, font-size, ...), and every run contains multiple w:t (text parts).

As you can see, a simple sentence like Hello World might be separated in multiple w:t, which makes templating quite difficult to implement.

3020 questions
1
vote
1 answer

how to Wrap Text across an image in docx file in c#

I am making a docx generator using Docx.dll. So far i have been able to insert images and text into the document. The images and paragraph are not aligned. I need to wrap text the image. How do i do it? I looked for it in google and found this…
Newton Sheikh
  • 1,376
  • 2
  • 19
  • 42
1
vote
0 answers

Document created by Novacode docx is very big and seems not compressed

I have created a new document with novacode docx. Document consists of about 30 documents inserted into main document. Each sub document is compressed and very small(about 10 KB); Everything is OK except result document is very big (about 5 MB) .…
h.Ebrahimi
  • 87
  • 1
  • 7
1
vote
1 answer

Special characters for Docx with ooxml

I am converting HTML to docx using http://www.codeproject.com/Articles/91894/HTML-as-a-Source-for-a-DOCX-File. Most of the characters are read properly but some special characters such as •,“ ” are being displayed as •. What should I be doing to…
San
  • 1,797
  • 7
  • 32
  • 56
1
vote
1 answer

Getting paragraph count from Tika for both Word and PDF

I have a scenario where I need to reconcile two documents, an Word (.docx) doc as well as a PDF. The two are supposed to be "indentical" to each other (the PDF is just a PDF version of the DOCX file); meaning they should contain the same text,…
user1768830
1
vote
3 answers

Added Word Doc to CVS - became corrupt

I'm using CVSNT. I added a Microsoft 2007 docx file "as text" to the repository. After committing and before updating I tried to open the file again but was unable to. It said it was corrupt. I tried using the office word doc recovery and that was…
Ethan
1
vote
1 answer

How to hide confirmConversion dialog when open docx file with help 2003

I use eclipse swt in my application. It allows me open word files (doc, docx, rtf). But I cannot hide "Confirm Conversion at Open" dialog programmaticaly when open docx file (doc or rtf open fine). Windows XP SP3, Microsoft Word 2003 SP3 and…
Ptr
  • 11
  • 2
1
vote
2 answers

Write and read hidden tags in .docx documents using .Net

What I would like to do is to be able to write some hidden marks in the document, so that when the user fills in some information, then I can process each part of the document according to the marks or sections that surrounded it. I'm using .NET,…
Rafael
  • 1,099
  • 5
  • 23
  • 47
1
vote
1 answer

How to print table header on every page when generating docx?

My template is simple. Table header and: [a.name;block=w:tr] [a.version] [a.description] I managed to create a table, but I am wonder if I can repeat table header on every document page.
Codium
  • 3,200
  • 6
  • 34
  • 60
1
vote
1 answer

Reading a DOCX file from stream

I wrote the following code for posting two file (an XML and a DOCX file) into a webservice: public void postMultipleFiles(string url, string[] files) { string boundary = "----------------------------" + DateTime.Now.Ticks.ToString("x"); …
user1976596
1
vote
4 answers

doc, docx conversion to pdf using php

i've been searching for quite a long now to convert a word document (.doc & .docx ) to pdf.....my application is about taking a word document from clients than converting them to a pdf with added changes ( like header, footer ) to the original…
1
vote
1 answer

docx generation: putting elements inside ffData (CTFFData)

I'm using docx4j library to generate a docx file. I need to put couple of other elements inside the w:ffData tag, eventually creating a structure like this:
geca
  • 2,711
  • 2
  • 17
  • 26
1
vote
0 answers

Is there any PHP library support for converting from DOCX to DOC file formats? Even better any MSO-X to the original format

I'm developing a system where users will be uploading documents. It will only support MS Office files 2003 and earlier so it currently barfs if the user uploads any of the x files (docx, xlsx, pptx). I've found PHPWord in another SO question, but…
Dan
  • 3,246
  • 1
  • 32
  • 52
1
vote
1 answer

When I open a docx in hex viewer, can someone explain what i'm seeing

I wanted to learn about what I was looking at when I open a DOCX file in a hexadecimal viewer. For example: Hexadecimal is base 16 on a 32bit (DWORD) file?. So I was assuming that starting from right to left you would do: 0*16^0 + 0*16^1 + 6*16^2 +…
Jimmyt1988
  • 20,466
  • 41
  • 133
  • 233
1
vote
1 answer

How change table contents and style using python-docx?

I found python-docx, it looks very smart, but I have to do some tasks that are not well documented. I need to open a .docx template, with a table within, ad for all the istances present in a list previously created, I have to format them in the…
Mychot sad
  • 380
  • 4
  • 9
1
vote
1 answer

docx4j: help converting docx to PDF

My goal is to take an existing .docx file and convert it, from a Linux command-line, to PDF using docx4j (http://www.docx4java.orghttp://www.docx4java.org). The "getting started" guide…
brianjcohen
  • 965
  • 2
  • 10
  • 14
1 2 3
99
100