1

I understand from the XML Schema specification of whitespace:

For all datatypes ·derived· by ·union· whiteSpace does not apply directly; however, the normalization behavior of ·union· types is controlled by the value of whiteSpace on that one of the ·memberTypes· against which the ·union· is successfully validated.

and:

for string the value of whiteSpace is preserve

I am attempting to add an XSD to an already existing XML system. I sometimes need to match existing messy strings. For example let's say I need to match:

  <?xml version="1.0" encoding="UTF-8" ?>
  <root1 stuff="Hello World&#x09;!" />

Using this (below) XSD all is good, because "for the xs:string the value of whitespace is preserved".

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">  
  <xs:simpleType name="Hello">
    <xs:restriction base="xs:string">
      <xs:pattern value="Hello World&#x09;!"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:element name="root1">
    <xs:complexType>
      <xs:attribute name="stuff" type="Hello" />
    </xs:complexType>
  </xs:element>
</xs:schema>

The problem starts when I need to extend the legal value of "stuff" to union with other simple types. I find that the problem doesn't care what I union with, or that I union with anything at all. So the simplest demonstration is to add these definitions to the above XSD.

  <xs:simpleType name="HellUnion">
    <xs:union memberTypes="Hello" />
  </xs:simpleType>
  <xs:element name="root2">
    <xs:complexType>
      <xs:attribute name="stuff" type="HellUnion" />
    </xs:complexType>
  </xs:element>

Then Xerces-C rejects this XML as not valid:

<?xml version="1.0" encoding="UTF-8" ?>
<root2 stuff="Hello World&#x09;!" />

I think the error message is enlightening:

value 'Hello World !' does not match any member types of the union

Notice the space before the '!' where the tab was previously.

Microsoft .Net runtime XML/XSD system reports it as valid.

I have tried adding whitespace=preserve to the "Hello" simple type definition:

  <xs:simpleType name="Hello">
    <xs:restriction base="xs:string">
      <xs:whiteSpace value="preserve"/>
      <xs:pattern value="Hello World&#x09;!"/>
    </xs:restriction>
  </xs:simpleType>

This does not help.

The above is an attempt to simply demonstrate the problem, but the real problem comes from unions of regular expressions that allow white space. In that case the document is found to be Valid but the values returned to the application are "white space collapsed" when validating and not collapsed when not validating.

This is a problem when when moving a product from un-validated XML to validated XML. These values need to exactly match string values elsewhere (not in XML) in the program.

Perhaps someone can tell me a better way to set "white space preserve" on a simple type?

Tb.
  • 126
  • 1
  • 5
  • Possible duplicate of [XML Schema union ignore whiteSpace property](http://stackoverflow.com/questions/27307880/xml-schema-union-ignore-whitespace-property) – sergioFC Oct 01 '16 at 09:49
  • Yes they are similar, but this is clearly a Xerces bug and the other link was reported as a saxon bug. – Tb. Aug 11 '17 at 00:51

0 Answers0