0

I'm quite new to docx4j. After installing everything, I tried creating an empty .docx file, and then write text in it. Here's the code :

WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
wordMLPackage.getMainDocumentPart().addParagraphOfText("Hello Word!");
wordMLPackage.save(new java.io.File("HelloWord1.docx"));

The file is succesfully created, but when I try to open it with Word 2010, I get an error message saying the file is corrupted. However, when I open it with WordPad, everything is fine, and the text is there. What could I do to solve this problem and open my created documents with Word 2010 ?

EDIT : I converted the corrupted file to zip, here's document.xml :

<?xml version="1.0" encoding="UTF-8" standalone="true"?>

-<w:document mc:Ignorable="w14 w15" xmlns:ns32="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas" xmlns:ns31="http://schemas.openxmlformats.org/drawingml/2006/compatibility" xmlns:ns30="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns:odgm="http://opendope.org/SmartArt/DataHierarchy" xmlns:odi="http://opendope.org/components" xmlns:oda="http://opendope.org/answers" xmlns:odq="http://opendope.org/questions" xmlns:odc="http://opendope.org/conditions" xmlns:odx="http://opendope.org/xpaths" xmlns:ns23="http://schemas.microsoft.com/office/2006/coverPageProps" xmlns:ns21="urn:schemas-microsoft-com:office:powerpoint" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:ns17="urn:schemas-microsoft-com:office:excel" xmlns:dsp="http://schemas.microsoft.com/office/drawing/2008/diagram" xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:dgm="http://schemas.openxmlformats.org/drawingml/2006/diagram" xmlns:ns12="http://schemas.openxmlformats.org/drawingml/2006/chartDrawing" xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:ns9="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">


-<w:body>


-<w:p w14:textId="bde3dbce" w14:paraId="bde3dbce">


-<w:pPr>

<w15:collapsed w:val="false"/>

</w:pPr>


-<w:r>

<w:t>Hello Word!</w:t>

</w:r>

</w:p>


-<w:sectPr>

<w:pgSz w:w="12240" w:code="1" w:h="15840"/>

<w:pgMar w:left="1440" w:bottom="1440" w:right="1440" w:top="1440"/>

</w:sectPr>

</w:body>

</w:document>

EDIT 2 : So after hours scratching my head to solve that, I completly uninstalled and deleted all references to docx4j, the re-added the JAR files. For some reason, no more problems after that.

ArcDexx
  • 453
  • 5
  • 15
  • The answer below is probably the solution to my problem. But does anyone know how I can programatically change the standalone attribute in my XML files ? Any work-around would be greatly appreciated as well... – ArcDexx Jun 18 '14 at 15:14

1 Answers1

0

I ran this exact code, using the current stable release of docx4j (v3.1) with no issues. A document was created and opened just fine in MS Word 2010. Here's the complete content of my test class, which creates a Word file in a 'test' directory in my C drive (adjust for your machine / OS, obviously):

public class PlayDocx4J {

    public static void main(String[] args) {

        try {
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
            wordMLPackage.getMainDocumentPart().addParagraphOfText("Hello Word!");
            wordMLPackage.save(new java.io.File("c:/test/helloword.docx"));
        } catch (Docx4JException e) {
            System.err.println("ERROR " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Update re XML having examined the document.xml file content you provided, the issue is with the XML declaration in there. As it stands, it throws a corrupt document error in Word. Word also tells you what the problem is (error in line #1, column 54). If you remove the standalone="true" attribute, and then paste the edited document.xml back into the zip file, it will open without any issue.

This raises the question why your generated file contains this declaration (it shouldn't, and I believe the proper value is standalone="yes" in any case). The answer must lie with the XML transformer being used in your Java implementation.

(You can read more about this declaration here: http://www.xmlplease.com/xml/xmlquotations/standalone).

Ben
  • 7,548
  • 31
  • 45
  • It's really weird. The file is created, and it seems to be fine since WordPad opens it with no problem. Could it be my MS Word version ? I tried cleaning projects, restarting Eclipse and computer... Since this doesn't seem to be a code problem I really have no clue how to solve this problem. – ArcDexx Jun 18 '14 at 09:55
  • Weird. I opened the file in Word 2010 just fine. What version of docx4j are you using? Do you have error-trapping around your code? One thing you can do: change the file extension to .zip, open the zip, open the 'word' folder therein, and extract `document.xml` -- paste the content of that as code in your original question, and I'll take a look -- if the file's truly corrupt it should be evident from that. – Ben Jun 18 '14 at 09:57
  • I'm also running the last version of docx4j. What do you mean by error-trapping ? I pasted the content of document.xml, hope you will find the solution from there. – ArcDexx Jun 18 '14 at 10:12
  • OK, source of the problem found -- see updated answer. – Ben Jun 18 '14 at 10:38
  • I don't know what went wrong, but somehow the problem is still there. I deleted the "standalone" attribute, didn't work. So I tried deleting the "standalone" attributes in ALL the XML files from my docx, still didn't work. I did some tests, and the loading/processing part works fine, I think the corruption happens when the save method is called... Any thoughts on the matter ? – ArcDexx Jun 18 '14 at 12:53
  • Yes, your XML processor is adding in the dodgy directive, which renders the resulting dock file corrupt. – Ben Jun 18 '14 at 20:49
  • Which Java are you using? (java -version). And which JAXB? – JasonPlutext Jun 19 '14 at 14:12