4

I'm currently fighting with using an XMLSerializer to execute XSD validation and collect the validation errors in the files. The task is the validation of the file, based on custom XSD-s containing valueset information, presence information etc.

My problem is the following: when using the XMLReader it stops at the first error, if we attach a listener to the ValidationEvents of the reader (through XMLReaderSettings). So I simply catch the exception where I log the error. So far everything is fine, the problems start to appear after logging the exception. Right after that the XMLReader goes to the end tag of the failed field, but I cannot validate the next field due to an unexplained exception.

To put it in practice, here's my code where I catch the exception:

  private bool TryDeserialize(XmlSerializer ser, XmlReader read,out object item)
  {
     string Itemname = read.Name;
     XmlReader read2 = read.ReadSubtree();
     try
     {
         item= ser.Deserialize(read2);
        return true;
     }
     catch (Exception e)
     {
        _ErrorList.Add("XSD error at " + Itemname + ": " + e.InnerException.Message);
        item = null;
        return false;
     }

  }

This routine works well, but what follows is problematic. Assume I pass the following XML snippet to this code:

      <a>2885</a>
  <b>ABC</b>
  <c>5</c>

Assume that 'b' may not have 'ABC' as a value, so I get an XSD error. At the end of this, the xmlreader will be at 'EndElement, Name=b' from which I simply cannot move unless I get an exception. If I do xmlreader.read, then I get the following exception (cut the namespace here):

"e = {"The element 'urn:iso:.....b' cannot contain child element 'urn:iso:.....:c' because the parent element's content model is text only."}"

After this the xmlreader is at 'Element, Name=c', so it seems good, but when trying to deserialize it with the code above, I get the following exception:

'_message = "The transition from the 'ValidateElement' method to the 'ValidateText' method is not allowed."'

I don't really see how I may go over it. I tried without a second reader reading the subtree, but I have the same problem. Please suggest me something, I really am stuck. Thanks a lot in advance!

Greets

user1771386
  • 73
  • 1
  • 8

1 Answers1

0

You may have to consider the following things:

  • In general, it is not always possible to "collect" all the errors, simply because validating parsers are free to abandon the validation process when certain types of errors occur, particularly those that put the validator in a state where it can't reliably recover. For e.g., a validator may still continue after running into a constraining facet violation for a simple type, but it'll skip a whole section if it runs in unexpected content.

  • Unlike parsing into a DOM, where the loading of a DOM is not affected by a validating reader failing validation, deserializing into an object is (or at least should be) totally different: DOM is about being well formed; deserializing, i.e. strong typing is about being valid.

Intuitively I would think that if you get a validation error, what is the point in continuing with the deserialization, and further validation?

Try validating your XML independent of deserialization. If indeed you get more errors flagged with this approach, then the above should explain why. If not, then you're chasing something else.

Petru Gardea
  • 21,373
  • 2
  • 50
  • 62
  • The purpose of the software is specifically to report all XSD errors, so that the producer of the XML (not me) would be able to correct it. In some cases of course the errors can't be collected if the XML is technically unparseable. But even then my software has to collect all the XSD errors before that point and then report unparsability. – user1771386 Apr 30 '13 at 13:20
  • Right now (as the post is quite old), I'm working on an SAX based solution effectively replacing the stock solution after not having found anything suitable. It's a monstrous task. :/ – user1771386 Apr 30 '13 at 16:15