I have two documents:
Document 2 is the result of passing Document 1 through a transformation process which leaves any content and formatting intact (verified by side-by-side compare in Word).
However, the process removes many id numbers from the .docx files.
For example,
<w:p w:rsidP="00B600D6" w:rsidR="00F55D78" w:rsidRDefault="00B600D6">
becomes
<w:p>
according to a dump of each document via the following code:
Body body = ((Document)newerPackage.getMainDocumentPart().getJaxbElement()).getBody(); Node node = org.docx4j.XmlUtils.marshaltoW3CDomDocument(body).getDocumentElement(); TransformerFactory tf = TransformerFactory.newInstance(); Transformer transformer = tf.newTransformer(); transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no"); transformer.setOutputProperty(OutputKeys.METHOD, "xml"); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4"); transformer.transform(new DOMSource(node), new StreamResult(new OutputStreamWriter(System.out, "UTF-8")));
Using the docx4j Differencer comparison method recommended here, everything (except the first line which has no formatting applied) is shown as a modification.
Question is: Are the diffs a result of the missing id's, the formatting or something else?
In case it's important, we're using docx4j in this context to perform automated sanity/regression tests on our round-tripping proceess (i.e. apply the "loss-less" process and expect no differences)