8

I get a string variable with XML in it and have a XSD file. I have to validate the XML in the string against the XSD file and know there is more than one way (XmlDocument, XmlReader, ... ?).

After the validation I just have to store the XML, so I don't need it in an XDocument or XmlDocument.

What's the way to go if I want the fastest performance?

g t
  • 7,287
  • 7
  • 50
  • 85
Hinek
  • 9,519
  • 12
  • 52
  • 74

4 Answers4

12

Others have already mentioned the XmlReader class for doing the validation, and I wont elaborate further into that.

Your question does not specify much context. Will you be doing this validation repeatedly for several xml documents, or just once? I'm reading a scenario where you are just validating a lot of xml documents (from a third party system?) and storing them for future use.

My contribution to the performance hunt would be to use a compiled XmlSchemaSet which would be thread safe, so several threads can reuse it without needing to parse the xsd document again.

var xmlSchema = XmlSchema.Read(stream, null);
var xmlSchemaSet = new XmlSchemaSet();
xmlSchemaSet.Add(xmlSchema);
xmlSchemaSet.Compile();

CachedSchemas.Add(name, xmlSchemaSet);
sisve
  • 19,501
  • 3
  • 53
  • 95
  • 1
    Yes, I validate and store a lot of xml document from a third party system for later use. The XSD is always the same, so your hint, compiling the schema set is much apprechiated, thanks! – Hinek Sep 10 '10 at 08:02
  • 3
    What is `CachedSchemas` in this example? – Welton v3.62 Jul 05 '11 at 22:01
  • 1
    Just a IDictionary for caching the results. – sisve Jul 06 '11 at 04:55
  • Why do you think XmlSchemaSet is thread safe? http://blogs.msdn.com/b/xmlteam/archive/2009/04/27/xmlschemaset-thread-safety.aspx – RichB Oct 11 '12 at 08:17
  • 1
    @RichB, that example works just as I described. Initialize a XmlSchemaSet, compile it, and then use it from several threads. But no, there is no support for what I am saying in any documentation I can find. – sisve Oct 11 '12 at 11:12
  • You might want to look into `XmlReaderSettings.IgnoreComments`, `IgnoreWhitespace` and `IgnoreProcessingInstructions`; My tests only apply to XML files without comments but if yours contain heavy comments, it might help (to verify) – Christian Rondeau Jun 01 '15 at 18:04
3

I would go for the XmlReader with XmlReaderSettings because does not need to load the complete XML in memory. It will be more efficient for big XML files.

Johann Blais
  • 9,389
  • 6
  • 45
  • 65
2

I think the fastest way is to use an XmlReader that validates the document as it is being read. This allows you to validate the document in only one pass: http://msdn.microsoft.com/en-us/library/hdf992b8.aspx

Rune Grimstad
  • 35,612
  • 10
  • 61
  • 76
0

Use an XmlReader configured to perform validation, with the source being a TextReader.

You can manually specify the XSD the XmlReader is to use if you don't want to rely on declarations in the input document (with XmlReaderSettings.Schemas property)

A start (just assumes XSD-instance declarations in the input document) would be:

var settings = new XmlReaderSettings {
   ConformanceLevel = ConformanceLevel.Document,
   ValidationType = ValidationType.Schema,
   ValidationFlags = XmlSchemaValidationFlags.ProcessSchemaLocation |
                     XmlSchemaValidationFlags.ProcessInlineSchema,
};

int warnings = 0;
int errors = 0;
settings.ValidationEventHandler += (obj, ea) => {
   if (args.Severity == XmlSeverityType.Warning) {
      ++warnings;
   } else {
      ++errors;
   }
};

XmlReader xvr = XmlReader.Create(new StringReader(inputDocInString), settings);

try {
   while (xvr.Read()) {
      // do nothing
   }

   if (0 != errors) {
      Console.WriteLine("\nFailed to load XML, {0} error(s) and {1} warning(s).", errors, warnings);
   } else if (0 != warnings) {
      Console.WriteLine("\nLoaded XML with {0} warning(s).", warnings);
   } else {
      System.Console.WriteLine("Loaded XML OK");
   }

   Console.WriteLine("\nSchemas loaded durring validation:");
   ListSchemas(xvr.Schemas, 1);

} catch (System.Xml.Schema.XmlSchemaException e) {
   System.Console.Error.WriteLine("Failed to read XML: {0}", e.Message);
} catch (System.Xml.XmlException e) {
   System.Console.Error.WriteLine("XML Error: {0}", e.Message);
} catch (System.IO.IOException e) {
   System.Console.Error.WriteLine("IO error: {0}", e.Message);
}
g t
  • 7,287
  • 7
  • 50
  • 85
Richard
  • 106,783
  • 21
  • 203
  • 265