8

I am using XDocument.Validate (it seems to function the same as XmlDocument.Validate) to validate an XML document against an XSD - this works well and I am informed of validation errors.

However, only some information seems to be exposed [reliably] in the ValidationEventHandler (and XmlSchemaException), e.g.:

  • the error message (i.e. "The 'X' attribute is invalid - The value 'Y' is invalid according to its datatype 'Z' - The Pattern constraint failed"),
  • the severity

What I would like is to get the "failing XPath" for the validation failure (where it makes sense): that is, I would like to get the failure in relation to the XML document (as opposed to the XML text).

Is there a way to obtain the "failing XPath" information from XDocument.Validate? If not, can the "failing XPath" be obtained through another XML validation method such as an XmlValidatingReader1?


Background:

The XML will be sent as data to my Web Service with an automatic conversion (via JSON.NET) from JSON to XML. Because of this I begin processing the XDocument data1 and not text, which has no guaranteed order due to the original JSON data. The REST client is, for reasons I care not to get into, basically a wrapper for HTML form fields over an XML document and validation on the server occurs in two parts - XML schema validation and Business Rule validation.

In the Business Rule validation it's easy to return the "XPath" for the fields which fail conformance that can be used to indicate the failing field(s) on the client. I would like to extend this to the XSD schema validation which takes care of the basic structure validation and, more importantly, the basic "data type" and "existence" of attributes. However, due to the desired automatic process (i.e. highlight the appropriate failing field) and source conversions, the raw text message and the source line/column numbers are not very useful by themselves.


Here is a snippet of the validation code:

// Start with an XDocument object - created from JSON.NET conversion
XDocument doc = GetDocumentFromWebServiceRequest();

// Load XSD    
var reader = new StringReader(EmbeddedResourceAccess.ReadResource(xsdName));
var xsd = XmlReader.Create(reader, new XmlReaderSettings());
var schemas = new XmlSchemaSet();
schemas.Add("", xsd);

// Validate
doc.Validate(schemas, (sender, args) => {
  // Process validation (not parsing!) error,
  // but how to get the "failing XPath"?
});

Update: I found Capture Schema Information when validating XDocument which links to "Accessing XML Schema Information During Document Validation" (cached) from which I determined two things:

  1. XmlSchemaException can be specialized into XmlSchemaValidationException which has a SourceObject property - however, this always returns null during validation: "When an XmlSchemaValidationException is thrown during validation by a validating XmlReader object, the value of the SourceObject property is null".

  2. I can read through the document (via XmlReader.Read) and "remember" the path prior to the validation callback. While this "seems like it works" in initial tests (without a ValidationCallback), it feels quite inelegant to me - but I've been able to find little else.

Community
  • 1
  • 1
  • please show your XML that you are working against.. we are not mind readers here.. thanks – MethodMan Feb 03 '13 at 21:46
  • @DJKRAZE I do not believe the XML is not of importance - in my case I could currently pick one of 10 XSDs and many more [purposefully] failing test-case data against such. I am not concerned with *why* it fails (this is in the message); I want to get the "XPath" of *where* it fails - the XML data ultimately comes from a REST request, and I'd like to provide a "more useful response". The message I posted comes from a "recoverable error", as in validation can continue, which is what I am most interested in. If the document is invalid under XML parsing rules then all bets are all. –  Feb 03 '13 at 21:51

6 Answers6

8

Sender of validation event is a source of event. So, you can search over the network for code which gets XPath for node (e.g. Generating an XPath expression) and generate XPath for source of event:

doc.Validate(schemas, (sender, args) => {
  if (sender is XObject)
  { 
     xpath = ((XObject)sender).GetXPath();
  }
});
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • 1
    Thanks - sneaky sender, I never even considered you! This works pretty well for finding invalid attributes (e.g. the sender is an XAttribute that fails a pattern or is an XElement that is not a valid child), but it seems like it will take a bit more work to find paths for *missing* elements as, well, logically, they have no corresponding XObject. That is, how to know which "case" of error it is - invalid element or invalid parent of missing element? And in the latter, how to get the path of the element that *should* exist. –  Feb 04 '13 at 06:51
  • 2
    @pst well, if you have missing element, then I think it's validation error of element's container (you will receive *element X has incomplete content* error). So, looks like you should provide path to parent element, which failed validation, not to missing element (which is actually missing, so path will not have any sense in that case). – Sergey Berezovskiy Feb 04 '13 at 08:25
  • 1
    Normally I'd agree wholeheartedly - but in this case it's for informing the (relatively dumb) WS client about the issue so a missing required attribute and an invalid attribute are about the same to it (it normalizes out null JSON attributes) .. the only alternative I can think of (which might just be me not thinking well enough) is to accept some "" for all attributes. In any case, I ended up just using a regular expression match/capture for the case of a "missing attribute" that works for my use case. Just don't expect it to run on non en-US threads .. :D Thanks again. –  Feb 04 '13 at 19:48
3

Take it :-)

var xpath = new Stack<string>();

var settings = new XmlReaderSettings
               {
                   ValidationType = ValidationType.Schema,
                   ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings,
               };
MyXmlValidationError error = null;
settings.ValidationEventHandler += (sender, args) => error = ValidationCallback(sender, args);
foreach (var schema in schemas)
{
    settings.Schemas.Add(schema);
}

using (var reader = XmlReader.Create(xmlDocumentStream, settings))
{
    // validation
    while (reader.Read())
    {
        if (reader.NodeType == XmlNodeType.Element)
        {
            xpath.Push(reader.Name);
        }

        if (error != null)
        {
            // set "failing XPath"
            error.XPath = xpath.Reverse().Aggregate(string.Empty, (x, y) => x + "/" + y);

            // your error with XPath now

            error = null;
        }

        if (reader.NodeType == XmlNodeType.EndElement ||
            (reader.NodeType == XmlNodeType.Element && reader.IsEmptyElement))
        {
            xpath.Pop();
        }
    }
}
dizel3d
  • 3,619
  • 1
  • 22
  • 35
  • Either it's early and my brain is jumbled or this code is jumbled. how is this implemented? there are missing explanations ): – Kristopher Jul 15 '20 at 13:59
1

I don't know the API but my guess is no, you can't get the xpath because validation may be implemented as a finite state machine. A state may not translate to an xpath or in the case when it is valid for more than one element to follow and the element found is not in the expected set, the xpath doesn't exist.

Jay Walker
  • 4,654
  • 5
  • 47
  • 53
  • I don't believe that's the case here and I believe it's possible: but I'm not sure how much has to be done. I have an XML document to start (XDocument, but I can get an XmlDocument just the same) so a "valid" XML DOM exists after parsing - unlike an SGML-based language, XML syntax doesn't change depending on schema. However, this document is not initially validated against any particular schema - and I'd like to validate it against a particular schema. When applying the validation I'd like to find out *where* in this document (i.e. which Element/Attribute) the validation fails. –  Feb 04 '13 at 04:10
  • That is, it seems to me that this should be possible given the fact that the DOM exists - it's not failing validation on a streaming read in my case. –  Feb 04 '13 at 04:38
1

Finally successed in this way!

When I use XmlReader.Create(xmlStream, settings) and xmlRdr.Read() to Validate a XML, I captured the sender of the ValidationEventHandler and find it is an object of {System.Xml.XsdValidatingReader},so I transfer sender to a xmlreader object, there are some functions in XMLReader class to help you find the parent node of the error attributes.

There is one thing to watch out that when i use XMLReader.MoveToElement(), the Validation function will be stuck into a loop in the error attribute,so i used MoveToAtrribute(AttributeName) and then MoveToNextAttribute to avoid stuck into the loop, maybe there is more elegant way to handle this.

Without further ado,below is my code.

public string XMLValidation(string XMLString, string SchemaPath)
    {
        string error = string.Empty;
        MemoryStream xmlStream = new MemoryStream(Encoding.UTF8.GetBytes(XMLString));

        XmlSchemaSet schemas = new XmlSchemaSet();
        schemas.Add(null, SchemaPath);

        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ValidationType = ValidationType.Schema;
        settings.Schemas.Add(schemas);

        settings.ValidationEventHandler += new ValidationEventHandler(delegate(object sender, ValidationEventArgs e)
        {
            switch (e.Severity)
            {
                case XmlSeverityType.Error:
                    XmlReader senRder = (XmlReader)sender;
                    if (senRder.NodeType == XmlNodeType.Attribute)
                    {//when error occurs in an attribute,get its parent element name
                        string attrName = senRder.Name;
                        senRder.MoveToElement();
                        error += string.Format("ERROR:ElementName'{0}':{1}{2}", senRder.Name, e.Message, Environment.NewLine);
                        senRder.MoveToAttribute(attrName);
                    }
                    else
                    {
                        error += string.Format("ERROR:ElementName'{0}':{1}{2}", senRder.Name, e.Message, Environment.NewLine);
                    }
                    break;
                case XmlSeverityType.Warning:
                    break;
            }
        });
        XmlReader xmlRdr = XmlReader.Create(xmlStream, settings);
        while (xmlRdr.Read()) ;
        return error;
    }
王英俊
  • 11
  • 4
0

Alternatively you could use the code at How to find an XML node from a line and column number in C#? to get the failing node by using the args.Exception.LineNumber and args.Exception.LinePosition and then navigate the XML document as required to provide more information about what data caused the validation to fail.

Community
  • 1
  • 1
Mike Wade
  • 1,726
  • 1
  • 15
  • 28
0

If, like me, you are using the "XmlDocument.Validate(ValidationEventHandler validationEventHandler)" method to validate your XML:

// Errors and alerts collection
private ICollection<string> errors = new List<String>();

// Open xml and validate
...
{
    // Create XMLFile for validation
    XmlDocument XMLFile = new XmlDocument();

    // Validate the XML file
    XMLFile.Validate(ValidationCallBack);
}

// Manipulator of errors occurred during validation
private void ValidationCallBack(object sender, ValidationEventArgs args)
{
    if (args.Severity == XmlSeverityType.Warning)
    {
        errors.Add("Alert: " + args.Message + " (Path: " + GetPath(args) + ")");
    }
    else if (args.Severity == XmlSeverityType.Error)
    {
        errors.Add("Error: " + args.Message + " (Path: " + GetPath(args) + ")");
    }
}

The secret is to get the "Exception" property data of the "args" parameter. Do like this:

// Return this parent node
private string GetPath(ValidationEventArgs args)
{
    var tagProblem =((XmlElement)((XmlSchemaValidationException)args.Exception).SourceObject);
    return iterateParentNode(tagProblem.ParentNode) + "/" +tagProblem.Name;
}

private string iterateParentNode(XmlNode args)
{
    var node = args.ParentNode;

    if (args.ParentNode.NodeType == XmlNodeType.Element)
    {
        return interateParentNode(node) + @"/" + args.Name;
    }
    return "";
}
Silvair L. Soares
  • 1,018
  • 12
  • 28