I have legacy XML documents that contain nested (non-root) elements that I want to validate against an XML Schema. The schema itself does not describe the XML document as a whole, but only a particular nested element.
The XML document resembles a message received from a 3rd party system, has no xmlns
attributes, and even no XML processing instruction. It's a legacy thing that I cannot influence. Example:
<XM>
<MH> … nested header elements … </MH>
<MD>
<RECSET>
… payload elements go here …
</RECSET>
</MD>
</XM>
My aim is to validate /XM/MD/RECSET
against an XSD which defines the RECSET
element and any payload elements nested within. I do not have schemas that would describe the outer elements, i.e. XM
, MH
, MD
. I could modify all existing schemas and add dummy definitions, e.g. allowing for xs:all
, but that is not preferred.
The validation is an optional step in a processing pipeline, and I want to avoid unnecessarily repeated XML parsing and other processing which adds execution time (throughput is important).
Another constraint is that I want to use XmlDocument
, because down the processing pipeline I need an XmlDocument
instance to perform deserialization into an object model using XmlSerializer
. Again, this is an existing solution that I want to preserve.
My attempt is as follows:
// build an XmlDocument instance as the intermediate format of the message
var xml = new XmlDocument();
xml.LoadXml(msg.TransportMessage);
// obtain a pre-cached XmlSchemaSet instance matching the message represented by XmlDocument
XmlSchemaSet schemaSet = … ;
// find the whole payload represented by the RECSET element
var nodeToValidate = xml.SelectSingleNode("/XM/MD/RECSET");
// attach schemas to the document and validate the payload node
xml.Schemas = xsd;
xml.Validate(ValidationCallback, nodeToValidate);
This results in an error:
Schema information could not be found for the node passed into Validate. The node may be invalid in its current position. Navigate to the ancestor that has schema information, then call Validate again.
I've looked into the implementation of XmlDocument
and the DocumentSchemaValidator
class, which, in case of specific node validation, searches the DOM for schema information. Hence I tried attaching a reference to the correct schema to the node ad hoc:
XmlAttribute noNamespaceAttribute = xml.CreateAttribute("xsi:noNamespaceSchemaLocation", "http://www.w3.org/XMLSchema-instance");
foreach (XmlSchemaElement x in schemaSet.GlobalElements.Values)
{
if (x.Name == "RECSET")
{
noNamespaceAttribute.InnerText = x.SourceUri!;
break;
}
}
nodeToValidate.Attributes!.Append(noNamespaceAttribute);
However, that results in the very same error message.
A working way to achieve such validation is to take the nodeToValidate.OuterXml
and parse it either using a validating XmlReader
or a new XmlDocument
instance. However, that leads to another overhead in terms of memory and CPU. I'd rather avoid this route.
Is there a way to tell the validation engine to validate a particular node against an explicitly specified schema?