0

Using XMLUtils.marshalToString() from docx4j, I have the following content at identical locations in two docx files (extracted from corresponding word/document.xml after unzipping the .docx). These are the only differences between the files:

 <w:t xml:space="preserve">New line.  First is </w:t>

and

 <w:t xml:space="preserve">
 <w:r>
 <w:t xml:space="preserve">New line.</w:t>
 </w:r>
 <w:r>
 <w:t xml:space="preserve">  First is </w:t>
 </w:r>
 </w:t>

In the first document, the <w:t> node is output as above.

However, in the second, an empty <w:t> node is printed as follows:

   <w:t xml:space="preserve"></w:t>

I checked the w:t schema at http://www.schemacentral.com/sc/ooxml/e-w_p-1.html and w:r is a valid contained element.

Edit: the above link is the schema of the w:p element, not w:t. The proper link for w:t is: http://www.schemacentral.com/sc/ooxml/e-w_t-1.html. It clearly shows the only acceptable content for w:t is a string (not a w:r or any other tags). Consequently (as suggested Jason's answer below), the XML from document.xml was invalid, and (as such) not being unmarshalled into docx4j. As a result, the text was not available for output by XmlUtils.marshalToString().

What is keeping the second block from being output?

Jacob Zwiers
  • 1,092
  • 1
  • 13
  • 33

1 Answers1

0

You can trust marshalToString.

If it is returning an empty w:t, that's because the underlying org.docx4j.wml.Text object has a null or empty value field.

You need to look at whatever code is supposed to be populating that.

JasonPlutext
  • 15,352
  • 4
  • 44
  • 84
  • I looked at the output from where docx was read (triggered by `WordprocessingMLPackage.load`) and found this: `WARN org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 90 - [ERROR] : unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"r"). Expected elements ar`. That caused me to look again @ the link above and I've inserted the the mea culpa into the original question above. Short answer: the XML we're trying to read from the docx's document.xml is invalid. – Jacob Zwiers Jul 12 '12 at 15:23