1

I wish to canonize my XML unmarshalled by JAXB in accordance with Canonical XML Spec

If I write this :

marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);

will it work?

If not, can JAXB do the job ?

Shaun
  • 2,446
  • 19
  • 33
Bastiflew
  • 1,136
  • 3
  • 18
  • 31

2 Answers2

3

Below is my initial answer based on: http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Terminology

1 - The document is encoded in UTF-8

By default JAXB marshals to the UTF-8 encoding.

2 - Line breaks normalized to #xA on input, before parsing

JAXB doesn't retain line breaks so this doesn't really apply.

3 - Attribute values are normalized, as if by a validating processor

You can specify an XmlSchema on the Unmarshaller to have a JAXB implementation use a validation processor:

4- Character and parsed entity references are replaced

Most JAXB implementations delegate this work to the underlying parser used.

5- CDATA sections are replaced with their character content

The standard JAXB APIs do not allow you to marshal to a CDATA section, so you are ok here.

6 - The XML declaration and document type declaration (DTD) are removed

JAXB does not write out a DTD declaration. You can remove the XML declaration by doing the following:

marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);

7 - Empty elements are converted to start-end tag pairs

JAXB does not write out empty elements as start-end tag pairs. Should be able to come up with a workaround for this.

8 - Whitespace outside of the document element and within start and end tags is normalized

9 - All whitespace in character content is retained (excluding characters removed during line feed normalization)

JAXB implementations retain all whitespace in character content (between start/element tags).

10 - Attribute value delimiters are set to quotation marks (double quotes)

The reference and MOXy JAXB implementation use double quotes for attribute value delimiters.

11 - Special characters in attribute values and character content are replaced by character references

JAXB will replace & with &amp;, < with &lt;, and " with &quot;

12 - Superfluous namespace declarations are removed from each element

JAXB implementations do their best not to write extra namespace declarations, but can not guarantee that extra namespaces are not declared. There are some workarounds you can do to address this issue.

13 - Default attributes are added to each element

TBD

14 - Lexicographic order is imposed on the namespace declarations and attributes of each element

JAXB implementations do not guarantee on ordering of the namespace declarations and attributes of each element.

Community
  • 1
  • 1
bdoughan
  • 147,609
  • 23
  • 300
  • 400
  • 1
    Thanks :) For : 7 - Empty elements are converted to start-end tag pairs if I create an empty object jaxb write it's ok I think – Bastiflew Dec 08 '11 at 16:56
  • finally I found a library : http://santuario.apache.org who provides canonisation in accordance with w3c – Bastiflew Dec 24 '11 at 10:09
-1

No, As per my parctice, I'd not rely on JAXB_FORMATTED_OUTPUT property. You can test it with examples from spec you mentioned.

korifey
  • 3,379
  • 17
  • 17