Applying normalization to xs:token during deserialization

Question

I have an XML schema that defines a simple type based on xs:token, with a maximum length restriction.

When I validate an XML document against this schema, the validation correctly applies normalization to the content. Specifically contiguous whitespace characters are replaced by a single space. E.g. "A B" is normalized to "A B" before the maximum length restriction is checked.

However, when I deserialize the XML document into types generated by xsd.exe, the normalization is not applied. This can result in strings which are longer than the schema allows.

For reference, I'm using C# and .NET 4.5.2.

Here's a minimal example to demonstrate the issue.

Example XML schema:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="http://tempuri.org/XMLSchema.xsd"
    elementFormDefault="qualified"
    xmlns="http://tempuri.org/XMLSchema.xsd"
    xmlns:mstns="http://tempuri.org/XMLSchema.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
  <xs:element name="testElement" type="testType"/>

  <xs:complexType name="testType">
    <xs:sequence>
      <xs:element name="name" type="shortToken"/>
    </xs:sequence>
  </xs:complexType>

  <xs:simpleType name="shortToken">
    <xs:restriction base="xs:token">
      <xs:maxLength value="5"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

The type generated by giving the schema to xsd.exe:

[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "4.0.30319.18020")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://tempuri.org/XMLSchema.xsd")]
[System.Xml.Serialization.XmlRootAttribute("testElement", Namespace="http://tempuri.org/XMLSchema.xsd", IsNullable=false)]
public partial class testType {

    private string nameField;

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute(DataType="token")]
    public string name {
        get {
            return this.nameField;
        }
        set {
            this.nameField = value;
        }
    }
}

A valid XML document according to this schema:

<?xml version="1.0" encoding="utf-8"?>
<testElement xmlns="http://tempuri.org/XMLSchema.xsd">
  <name>A                 B</name>
</testElement>

If I validate the document, the value of the name element is correctly normalized and I get no errors.

However, if I use the following code to deserialize the XML document, the value of name is not normalized.

XmlSerializer xmlSerialiser = new XmlSerializer(typeof(testType));
testType result = (testType)xmlSerialiser.Deserialize(xmlReader);

It would seem like the responsibility for normalizing the value lies with the XmlReader, for which it would need to be aware of the schema. I have tried using XmlReaderSettings as follows, but without success.

XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(xmlSchema);

Please could someone provide me with an example of how to setup the XmlReader or the XmlSerializer such that the resulting value of the "name" property would be "A B" rather than "A B".

Thanks!

I am not familiar with c# nor .net, but I am familiar with XSD. I know "collapse" is the default and only possible value in this case, but have you tried adding the restriction ``? In addition maybe the answer to [this question](http://stackoverflow.com/questions/16376414/ignore-whitespace-while-reading-xml) can help you. — sergioFC, Oct 01 '15 at 00:00
The "token" type is indeed defined by the XML base schema with a whiteSpace restriction with the value "collapse". I tried adding this restriction explicitly to the test type defined in my schema, but this had no effect on the deserialization behaviour. — Industrial Zombie, Oct 01 '15 at 15:15
Thank-you for your suggestion. If I find the answer, I will post it here. — Industrial Zombie, Oct 01 '15 at 16:06

Applying normalization to xs:token during deserialization

0 Answers0