0

I need check the difference between 2 Word docx Files. Iam using docx4j. At first I had to change the SmartXMLFormatter:

    public SmartXMLFormatter(Writer w) throws IOException {
    this.xml = new XMLWriterNSImpl(w, false);
    if (this.writeXMLDeclaration) {
      this.xml.xmlDecl();
      this.writeXMLDeclaration = false;
    }

    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "w");
    this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2010/wordml", "w14");
    this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2012/wordml", "w15");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/officeDocument/2006/relationships", "r");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing", "wp");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/main", "a");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/picture", "pic");

    this.xml.setPrefixMapping(Constants.BASE_NS_URI, "dfx");
    this.xml.setPrefixMapping(Constants.DELETE_NS_URI, "del");
    this.xml.setPrefixMapping(Constants.INSERT_NS_URI, "ins");
  }

After I had changed my code without russian letters everything works fine. But when I diff 2 docx documents with russian characters the following exception raises:

    org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
    at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Exception in thread "main" javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.]
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
    at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    ... 7 more

So Please can anyone help me?

Here is the maincode:

    public class CompareDocumentsUsingDriver {

    public static JAXBContext context = org.docx4j.jaxb.Context.jc;

    /**
     * @param args
     */
    public static void main(String[] args) throws Exception {
        System.setProperty("file.encoding", "UTF-8");

        String newerfilepath = "B.docx";
        String olderfilepath = "A.docx";

        // 1. Load the Packages
        WordprocessingMLPackage newerPackage = WordprocessingMLPackage
                .load(new java.io.File(newerfilepath));
        WordprocessingMLPackage olderPackage = WordprocessingMLPackage
                .load(new java.io.File(olderfilepath));

        Body newerBody = ((Document) newerPackage.getMainDocumentPart()
                .getJaxbElement()).getBody();
        Body olderBody = ((Document) olderPackage.getMainDocumentPart()
                .getJaxbElement()).getBody();

        System.out.println("Differencing..");

        // 2. Do the differencing
        StringWriter sw = new StringWriter();

        Docx4jDriver.diff(XmlUtils.marshaltoW3CDomDocument(newerBody)
                .getDocumentElement(),
                XmlUtils.marshaltoW3CDomDocument(olderBody)
                        .getDocumentElement(), sw);
        // The signature which takes Reader objects appears to be broken

        // 3. Get the result

        String contentStr = sw.toString();
        System.out.println("Result: \n\n " + contentStr);

        Body newBody = (Body) XmlUtils.unwrap(XmlUtils.unmarshalString(contentStr));


        // In the general case, you need to handle relationships. Not done here!

        // RelationshipsPart rp =
        // newerPackage.getMainDocumentPart().getRelationshipsPart();
        // handleRels(pd, rp);
        newerPackage.setFontMapper(new IdentityPlusMapper());
        newerPackage.save(new java.io.File("COMPARED.docx"));

    }

    /**
     * In the general case, you need to handle relationships. Although not
     * necessary in this simple example, we do it anyway for the purposes of
     * illustration.
     */
    private static void handleRels(Differencer pd, RelationshipsPart rp) {
        // Since we are going to add rels appropriate to the docs being
        // compared, for neatness and to avoid duplication
        // (duplication of internal part names is fatal in Word,
        // and export xslt makes images internal, though it does avoid
        // duplicating
        // a part ),
        // remove any existing rels which point to images
        List<Relationship> relsToRemove = new ArrayList<Relationship>();
        for (Relationship r : rp.getRelationships().getRelationship()) {
            // Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
            if (r.getType().equals(Namespaces.IMAGE)) {
                relsToRemove.add(r);
            }
ti      }
        for (Relationship r : relsToRemove) {
            rp.removeRelationship(r);
        }

        // Now add the rels we composed
        List<Relationship> newRels = pd.getComposedRels();
        for (Relationship nr : newRels) {
            rp.addRelationship(nr);
        }
    }

}

Best regards,

Tim

EDIT:

public static void openResult(String nodename,  Writer out) throws IOException {
        // In general, we need to avoid writing directly to Writer out...
        // since it can happen before formatter output gets there

        // namespaces not properly declared:
        // 4 options:
        // 1:
        // OpenElementEvent containerOpen = new OpenElementEventNSImpl(xml1.getNamespaceURI(), rootNodeName);
        // formatter.format(containerOpen);
        // // AttributeEvent wNS = new AttributeEventNSImpl("http://www.w3.org/2000/xmlns/" , "w",
        // //       "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
        // // formatter.format(wNS);
        // but AttributeEvent is too late in the process to set the mapping.
        // so you can comment that out.
        // But you still have to add w: and the other namespaces in
        // SmartXMLFormatter constructor. So may as well do 2.:
        // 2: stick all known namespaces on our root element above
        // 3: fix SmartXMLFormatter
        // Go with option 2 .. since this is clear
        out.append("<" + nodename
                + " xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\""  // w: namespace
                + " xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\""
                + " xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\""
                + " xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
                + " xmlns:v=\"urn:schemas-microsoft-com:vml\""
                + " xmlns:w14=\"http://schemas.microsoft.com/office/word/2010/wordml\""
                + " xmlns:w15=\"http://schemas.microsoft.com/office/word/2012/wordml\""
                + " xmlns:w10=\"urn:schemas-microsoft-com:office:word\""
                + " xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\""
                + " xmlns:dfx=\"" + Constants.BASE_NS_URI + "\""  // Add these, since SmartXMLFormatter only writes them on the first fragment
                + " xmlns:del=\"" + Constants.DELETE_NS_URI + "\""
                + " xmlns:ins=\"" + Constants.BASE_NS_URI + "\""
                        + " >" );
    }
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Tim Schwalbe
  • 1,588
  • 4
  • 19
  • 37
  • Okay I solved it on my own. I added the following code to the Docx4jDriver: see above :D sorry iam new to stackoverflow ;) – Tim Schwalbe Jun 24 '14 at 13:10
  • But now when I run to even documents every runs perfect, but when there are some differences in the files everything is generated as well. But Word says unknown failure...I dont think anyone can help me... – Tim Schwalbe Jun 24 '14 at 13:14

0 Answers0