1

I'm using the package org.apache.xml.security.c14nfor the canonicalization of XMLs. I use the following code:

private String CanonicalizeXML(String XML) throws InvalidCanonicalizerException, CanonicalizationException, ParserConfigurationException, IOException, SAXException {

    Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
    return new String(canon.canonicalize(XML.getBytes()));
}

However, it doesn't seem to work as I expected, since it doesn't delete any non-necessary white spaces between elements. Do I do something wrong?

Thanks,

Ivan

Ivan
  • 495
  • 3
  • 9
  • 20

3 Answers3

2

I think it may be your expectation which is incorrect:

You don't say which version of XML Canonicalization, but both 1.0 and 1.1 say:

All whitespace in character content is retained (excluding characters removed during line feed normalization)

Chris Dickson
  • 11,964
  • 1
  • 39
  • 60
1

Is your xml document referencing a dtd or schema? Without one of those the parser has no way to know which whitespace is significant and so it has to preservere it.

Jörn Horstmann
  • 33,639
  • 11
  • 75
  • 118
  • Hmm... That explains a lot. I don't have a schema, but I will obviously have to create one. Or is there another way how to force it to delete whitespaces as they are irrelevant? Something like a very simple schema or something. – Ivan Feb 28 '11 at 11:56
0

The org.apache.xml.security.c14n does not remove whitespaces.

I resolved by setting setIgnoringBoundaryWhitespace = true on my SAXBuilder:

SAXBuilder builder = new SAXBuilder ();
builder.setIgnoringBoundaryWhitespace(true);
org.jdom2.Document doc = builder.build(is);
DOMOutputter out = new DOMOutputter();
Document docW3 = out.output(doc);
Gonzalo Gallotti
  • 2,413
  • 3
  • 23
  • 28