3

I wanted to convert docx to html. I started writing the code same as examples given in github. This is just loading part. There itself I'm getting the problem.

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

public class Main {

    public static void main(String[] args) throws Docx4JException, 
        String inputfilepath = "myfilepathhere";


        OutputStream os = new FileOutputStream(inputfilepath + ".html");

        WordprocessingMLPackage wordMLPackage = Docx4J
                .load(new FileInputStream(inputfilepath));

    }
}

I'm getting NullPointerException. Seeing the exception trace and navigating in source code in github, I suspect it has something to do with JAXB related thing from this class https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/jaxb/Context.java

Docx4j source code is available at https://github.com/plutext/docx4j.

Exception trace:

Exception in thread "main" org.docx4j.openpackaging.exceptions.Docx4JException: Couldn't get [Content_Types].xml from ZipFile
    at org.docx4j.openpackaging.io3.Load3.get(Load3.java:134)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:454)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:371)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:337)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:302)
    at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:170)
    at org.docx4j.Docx4J.load(Docx4J.java:195)
    at Main.main(Main.java:29)
Caused by: org.docx4j.openpackaging.exceptions.InvalidFormatException: Bad [Content_Types].xml
    at org.docx4j.openpackaging.contenttype.ContentTypeManager.parseContentTypesFile(ContentTypeManager.java:713)
    at org.docx4j.openpackaging.io3.Load3.get(Load3.java:132)
    ... 7 more
Caused by: java.lang.NullPointerException
    at org.docx4j.openpackaging.contenttype.ContentTypeManager.parseContentTypesFile(ContentTypeManager.java:679)
    ... 8 more

The docx document is good (created by Word 2010). I've even unzipped it to see if the Content_Types.xml is there. It's there.

I'm using Eclipse and Java SE 7. I've added all the required jar files to Java build path in project properties.

Please help me.

Update:

Actually when I added this line from Context.java into my class to see if that's the problem.

     JAXBContext.newInstance("org.docx4j.openpackaging.contenttype");

I could see the following exception in my console:

    Exception in thread "main" javax.xml.bind.JAXBException: Provider org.eclipse.persistence.jaxb.JAXBContextFactory not found
 - with linked exception:
[java.lang.ClassNotFoundException: org.eclipse.persistence.jaxb.JAXBContextFactory]
    at javax.xml.bind.ContextFinder.newInstance(Unknown Source)
    at javax.xml.bind.ContextFinder.find(Unknown Source)
    at javax.xml.bind.JAXBContext.newInstance(Unknown Source)
    at javax.xml.bind.JAXBContext.newInstance(Unknown Source)
    at javax.xml.bind.JAXBContext.newInstance(Unknown Source)
    at Main.main(Main.java:26)
Caused by: java.lang.ClassNotFoundException: org.eclipse.persistence.jaxb.JAXBContextFactory
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at javax.xml.bind.ContextFinder.safeLoadClass(Unknown Source)
    ... 6 more
pinkpanther
  • 4,770
  • 2
  • 38
  • 62
  • If you upload your docx to the docx4j webapp (or download the Word AddIn), do those instances of docx4j load your docx successfully? – JasonPlutext Mar 24 '15 at 11:10
  • @JasonPlutext I uploaded my doc to http://webapp.docx4java.org/OnlineDemo/docx_to_pdf.html and clicked process and I got pdf correctly. – pinkpanther Mar 24 '15 at 16:22
  • Please try http://webapp.docx4java.org/OnlineDemo/PartsList.html instead – JasonPlutext Mar 24 '15 at 23:10
  • @JasonPlutext I tried, the page has shown [Content_Types].xml and parts information. – pinkpanther Mar 25 '15 at 05:02
  • If you turn logging on, what does docx4j output before the stack trace? Something like http://stackoverflow.com/questions/12363169/docx4j-no-suitable-jaxb-implementation-available-runtime-error-java-1-5 or http://www.docx4java.org/forums/docx-java-f6/invalidformatexception-by-using-docx4j-with-eclipse-t807.html the root cause for both of which is no JAXB implementation present – JasonPlutext Mar 25 '15 at 06:41
  • @JasonPlutext I'm not sure how to do that logging thing...so I did the thing I've mentioned in above update. The exception is `java.lang.ClassNotFoundException: org.eclipse.persistence.jaxb.JAXBContextFactory`. Please see my update. thanks – pinkpanther Mar 25 '15 at 06:56

3 Answers3

2

docx4j supports several different JAXB implementations:

  • the reference implementation
  • the one Sun/Oracle include in Java 6/7/8
  • EclipseLink MOXy

If you want to use MOXy, you need:

  1. the relevant EclipseLink jars
  2. docx4j-MOXy-JAXBContext-3.0.0.jar (which just contains the jaxb.properties files)

The jaxb.properties files just say:

javax.xml.bind.context.factory=org.eclipse.persistence.jaxb.JAXBContextFactory

If you are using maven, you'll just need to add:

<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-MOXy-JAXBContext</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>org.eclipse.persistence</groupId>
<artifactId>org.eclipse.persistence.moxy</artifactId>
<version>2.5.1</version>
</dependency>

Is the docx4j-MOXy-JAXBContext jar on your classpath? Either remove it, or add the relevant EclipseLink jars

JasonPlutext
  • 15,352
  • 4
  • 44
  • 84
  • Yes....thank you...Actually I've included all the libraries under docx4j-3.2.1 and also optional libraries..that's the problem... – pinkpanther Mar 25 '15 at 09:30
0

This works for me, try this out

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.OutputStream;

import org.docx4j.Docx4J;
import org.docx4j.Docx4jProperties;
import org.docx4j.convert.out.HTMLSettings;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

public class Test {
    public static void main(String[] args) throws Docx4JException,
            FileNotFoundException {
        String inputfilepath = "c:/file.docx";


        WordprocessingMLPackage wordMLPackage = Docx4J
                .load(new FileInputStream(inputfilepath));

        // HTML exporter setup (required)
        //.. the HTMLSettings object
        HTMLSettings htmlSettings = Docx4J.createHTMLSettings();

        htmlSettings.setImageDirPath(inputfilepath + "_files");
        htmlSettings.setImageTargetUri(inputfilepath.substring(inputfilepath
                .lastIndexOf("/") + 1) + "_files");
        htmlSettings.setWmlPackage(wordMLPackage);

        OutputStream os = new FileOutputStream(inputfilepath + ".html");

        // If you want XHTML output
        Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);

        //Prefer the exporter, that uses a xsl transformation
        Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

    }

}
Joe Doe
  • 141
  • 1
  • 3
0

Ensure you've all the right dependencies (including the appropriate JAXB Runtime)

implementation 'org.docx4j:docx4j-core:11.4.7'
implementation 'org.docx4j:docx4j-MOXy-JAXBContext:6.0.0'
implementation 'org.docx4j:docx4j-export-fo:11.4.7'
implementation 'org.docx4j:docx4j-JAXB-Internal:8.3.8'
implementation 'org.docx4j:docx4j-JAXB-ReferenceImpl:11.4.7'
implementation 'org.docx4j:docx4j-JAXB-MOXy:11.4.7'
implementation 'jakarta.xml.bind:jakarta.xml.bind-api:4.0.0'
implementation 'org.glassfish.jaxb:jaxb-runtime:4.0.0'
implementation 'jakarta.activation:jakarta.activation-api:2.1.0'
ram
  • 747
  • 2
  • 11
  • 34