I have an XML schema that defines a simple type based on xs:token, with a maximum length restriction.
When I validate an XML document against this schema, the validation correctly applies normalization to the content. Specifically contiguous whitespace characters are replaced by a single space. E.g. "A B" is normalized to "A B" before the maximum length restriction is checked.
However, when I deserialize the XML document into types generated by xsd.exe, the normalization is not applied. This can result in strings which are longer than the schema allows.
For reference, I'm using C# and .NET 4.5.2.
Here's a minimal example to demonstrate the issue.
Example XML schema:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="http://tempuri.org/XMLSchema.xsd"
elementFormDefault="qualified"
xmlns="http://tempuri.org/XMLSchema.xsd"
xmlns:mstns="http://tempuri.org/XMLSchema.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
<xs:element name="testElement" type="testType"/>
<xs:complexType name="testType">
<xs:sequence>
<xs:element name="name" type="shortToken"/>
</xs:sequence>
</xs:complexType>
<xs:simpleType name="shortToken">
<xs:restriction base="xs:token">
<xs:maxLength value="5"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
The type generated by giving the schema to xsd.exe:
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "4.0.30319.18020")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://tempuri.org/XMLSchema.xsd")]
[System.Xml.Serialization.XmlRootAttribute("testElement", Namespace="http://tempuri.org/XMLSchema.xsd", IsNullable=false)]
public partial class testType {
private string nameField;
/// <remarks/>
[System.Xml.Serialization.XmlElementAttribute(DataType="token")]
public string name {
get {
return this.nameField;
}
set {
this.nameField = value;
}
}
}
A valid XML document according to this schema:
<?xml version="1.0" encoding="utf-8"?>
<testElement xmlns="http://tempuri.org/XMLSchema.xsd">
<name>A B</name>
</testElement>
If I validate the document, the value of the name element is correctly normalized and I get no errors.
However, if I use the following code to deserialize the XML document, the value of name is not normalized.
XmlSerializer xmlSerialiser = new XmlSerializer(typeof(testType));
testType result = (testType)xmlSerialiser.Deserialize(xmlReader);
It would seem like the responsibility for normalizing the value lies with the XmlReader, for which it would need to be aware of the schema. I have tried using XmlReaderSettings as follows, but without success.
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(xmlSchema);
Please could someone provide me with an example of how to setup the XmlReader or the XmlSerializer such that the resulting value of the "name" property would be "A B" rather than "A B".
Thanks!