I have a simple XML file with an XSD schema, where some elements are allowed to contain only certain elements, e.g.
<xsd:element name="day" type="xsd:date"/>
<xsd:element name="interval">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="day" minOccurs="2" maxOccurs="2"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
and the XML code:
<interval>
<day>2016-08-21</day>
<day>2016-10-21</day>
</interval>
If within the interval
tags I type anything but whitespace or day
, it will (correctly) fail to validate. Now, using lxml
in python, I extracted the canonical version (C14N) of such XML, and I found that the whitespace (those 4 spaces of indentation) were kept (as the standard says).
I need then to digitally sign this document, but I do not understand why would anyone sign that whitespace. It seems a weakness to me: different indentation implies different canonical XML (and mismatching signatures); but this is an unambiguous case in which that whitespace has nothing to do with the represented data (all the more so as the schema would not validate against any meaningful content).
- Why is that whitespace part of a canonical representation of an XML involved in digital signatures?
- Is there any way of enforcing in the XSD the removal of such useless whitespace?
I was thinking more specifically of the whiteSpace
facet. By specifying collapse
the whitespace should be removed on validation; but it seems that whiteSpace
cannot be applied to a complexType
, and I could not find a way of combining it with a sequence
.
- Can I apply the
whiteSpace
facet to acomplexType
(element only) node?