I wish to canonize my XML unmarshalled by JAXB in accordance with Canonical XML Spec
If I write this :
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
will it work?
If not, can JAXB do the job ?
I wish to canonize my XML unmarshalled by JAXB in accordance with Canonical XML Spec
If I write this :
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
will it work?
If not, can JAXB do the job ?
Below is my initial answer based on: http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Terminology
1 - The document is encoded in UTF-8
By default JAXB marshals to the UTF-8 encoding.
2 - Line breaks normalized to #xA on input, before parsing
JAXB doesn't retain line breaks so this doesn't really apply.
3 - Attribute values are normalized, as if by a validating processor
You can specify an XmlSchema
on the Unmarshaller
to have a JAXB implementation use a validation processor:
4- Character and parsed entity references are replaced
Most JAXB implementations delegate this work to the underlying parser used.
5- CDATA sections are replaced with their character content
The standard JAXB APIs do not allow you to marshal to a CDATA section, so you are ok here.
6 - The XML declaration and document type declaration (DTD) are removed
JAXB does not write out a DTD declaration. You can remove the XML declaration by doing the following:
marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);
7 - Empty elements are converted to start-end tag pairs
JAXB does not write out empty elements as start-end tag pairs. Should be able to come up with a workaround for this.
8 - Whitespace outside of the document element and within start and end tags is normalized
9 - All whitespace in character content is retained (excluding characters removed during line feed normalization)
JAXB implementations retain all whitespace in character content (between start/element tags).
10 - Attribute value delimiters are set to quotation marks (double quotes)
The reference and MOXy JAXB implementation use double quotes for attribute value delimiters.
11 - Special characters in attribute values and character content are replaced by character references
JAXB will replace &
with &
, <
with <
, and "
with "
12 - Superfluous namespace declarations are removed from each element
JAXB implementations do their best not to write extra namespace declarations, but can not guarantee that extra namespaces are not declared. There are some workarounds you can do to address this issue.
13 - Default attributes are added to each element
TBD
14 - Lexicographic order is imposed on the namespace declarations and attributes of each element
JAXB implementations do not guarantee on ordering of the namespace declarations and attributes of each element.
No, As per my parctice, I'd not rely on JAXB_FORMATTED_OUTPUT
property. You can test it with examples from spec you mentioned.