36

I'm loading an XML document in my C# application with the following:

XDocument xd1 = new XDocument();
xd1 = XDocument.Load(myfile);

but before that, I do test to make sure the file exists with:

File.Exists(myfile);

But... is there an (easy) way to test the file before the XDocument.Load() to make sure it's a valid XML file? In other words, my user can accidentally click on a different file in the file browser and trying to load, say, a .php file causes an exception.

The only way I can think of is to load it into a StreamWriter and simple do a text search on the first few characters to make sure they say "

Thanks!

-Adeena

adeena
  • 4,027
  • 15
  • 40
  • 52

7 Answers7

46

It's probably just worth catching the specific exception if you want to show a message to the user:

 try
 {
   XDocument xd1 = new XDocument();
   xd1 = XDocument.Load(myfile);
 }
 catch (XmlException exception)
 {
     ShowMessage("Your XML was probably bad...");
 }
Jennifer
  • 5,148
  • 2
  • 21
  • 19
  • 3
    This works fine, but if we can reasonably expect that an exception can happen often, isn't that getting to the point where we're using exception handling to manage flow, when we should be using an if statement or something? Seems like XmlDocument should have a TryLoad method something like int.TryParse(), or an IsWellFormed(xml) method... – MGOwen Apr 29 '15 at 02:29
  • @MGOwen you could always add such a method as an extension :) – defines Mar 15 '17 at 20:29
  • 1
    @MGOwen agreed. exceptions aren't for flow control. This is c#, not python. It would be nice if there was a real solution for this, seven years later. – A.R. Aug 02 '22 at 02:43
28

This question confuses "well-formed" with "valid" XML document.

A valid xml document is by definition a well formed document. Additionally, it must satisfy a DTD or a schema (an xml schema, a relaxng schema, schematron or other constraints) to be valid.

Judging from the wording of the question, most probably it asks:

"How to make sure a file contains a well-formed XML document?".

The answer is that an XML document is well-formed if it can be parsed successfully by a compliant XML parser. As the XDocument.Load() method does exactly this, you only need to catch the exception and then conclude that the text contained in the file is not well formed.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
11

Just load it and catch the exception. Same for File.Exists() - the file system is volatile so just because File.Exists() returns true doesn't mean you'll be able to open it.

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • Please can you elaborate why it is volatile and what is the situation it might fail? – ANewGuyInTown Feb 21 '19 at 05:22
  • 1
    @ANewGuyInTown Volatile means it can change separately from your program from one instant to the next. Modern operating systems, including Windows, linux, and OS X, all do _preemptive multitasking_ for managing processes. This means the OS thread scheduler can, _at any instant_, pause your process in the middle of a method and swap it out for a different process. It's therefore possible a considerable amount of CPU time passes between when your code checks `.Exists()` and when it then acts on a `true` result, such that `true` would now be `false`. There are other issues with `.Exists()`, too – Joel Coehoorn Feb 21 '19 at 14:50
  • 1
    @ANewGuyInTown (continued) Most other things don't matter: memory in your program is _your_ memory, allocated for your program, and shouldn't change out from under you. Once you actually open a file, you can lock to keep it safe. Network sockets, gdi resources, semaphores, etc, all get locked to your program. But the file system is _shared_, so checking `.Exists()`, which does not lock the file, is _dangerous_. – Joel Coehoorn Feb 21 '19 at 14:53
  • Thanks @Joel Coehoorn. Great response! – ANewGuyInTown Feb 21 '19 at 23:37
3

If you have an XSD for the XML, try this:

using System;
using System.Xml;
using System.Xml.Schema;
using System.IO;
public class ValidXSD 
{
    public static void Main()
    {
        // Set the validation settings.
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ValidationType = ValidationType.Schema;
        settings.ValidationFlags |= XmlSchemaValidationFlags.ProcessInlineSchema;
        settings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings;
        settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);

        // Create the XmlReader object.
        XmlReader reader = XmlReader.Create("inlineSchema.xml", settings);

        // Parse the file. 
        while (reader.Read());
    }

    // Display any warnings or errors.
    private static void ValidationCallBack (object sender, ValidationEventArgs args) 
    {
        if (args.Severity == XmlSeverityType.Warning)
            Console.WriteLine("\tWarning: Matching schema not found.  No validation occurred." + args.Message);
        else
            Console.WriteLine("\tValidation error: " + args.Message);
    }  
}

Reference is here:

http://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings.validationeventhandler.aspx

Manual5355
  • 981
  • 10
  • 27
Colby Africa
  • 1,356
  • 9
  • 13
1

As has previously been mentioned "valid xml" is tested by XmlDocument.Load(). Just catch the exception. If you need further validation to test that it's valid against a schema, then this does what you're after:

using System.Xml; 
using System.Xml.Schema; 
using System.IO; 

static class Program
{     
    private static bool _Valid = true; //Until we find otherwise 

    private static void Invalidated() 
    { 
        _Valid = false; 
    } 

    private static bool Validated(XmlTextReader Xml, XmlTextReader Xsd) 
    { 

        var MySchema = XmlSchema.Read(Xsd, new ValidationEventHandler(Invalidated)); 

        var MySettings = new XmlReaderSettings(); 
        { 
            MySettings.IgnoreComments = true; 
            MySettings.IgnoreProcessingInstructions = true; 
            MySettings.IgnoreWhitespace = true; 
        } 

        var MyXml = XmlReader.Create(Xml, MySettings); 
        while (MyXml.Read) { 
          //Parsing...
        } 
        return _Valid; 
    } 

    public static void Main() 
    { 
        var XsdPath = "C:\\Path\\To\\MySchemaDocument.xsd"; 
        var XmlPath = "C:\\Path\\To\\MyXmlDocument.xml"; 

        var XsdDoc = new XmlTextReader(XsdPath); 
        var XmlDoc = new XmlTextReader(XmlPath); 

        var WellFormed = true; 

        XmlDocument xDoc = new XmlDocument(); 
        try { 
            xDoc.Load(XmlDoc); 
        } 
        catch (XmlException Ex) { 
            WellFormed = false; 
        } 

        if (WellFormed & Validated(XmlDoc, XsdDoc)) { 
          //Do stuff with my well formed and validated XmlDocument instance... 
        } 
    } 
} 
BenAlabaster
  • 39,070
  • 21
  • 110
  • 151
0

I would not XDocument.Load(), as per the accepted answer; why would you read the entire file into memory, it could be a huge file?

I'd probably read the first few bytes into a byteArray (it could even be any binary file), convert the byteArray to string e.g. System.Text.Encoding.ASCII.GetString(byteArray) ,check if the converted string contains the Xml elements you are expecting, only then continue.

joedotnot
  • 4,810
  • 8
  • 59
  • 91
  • this wouldn't tell you if the xml is valid or not – yawnobleix Aug 19 '19 at 13:37
  • yes i know it would not tell me about validity, but it's a preliminary test i would do to immediately reject invalid files (e.g. pdf, binary,etc) and even well-formed xml files which are not of my expected format. – joedotnot Aug 19 '19 at 13:58
0

I know this thread is almost 12 years old but I still would like to add my solution as I can't find it anywhere else. What I think you want is just a way to check if the file is a xml File, not if the file is well structured or anything. (that's how I understand the question).

I found a way to easily check if a file is a xml file (or whatever file you need, this works for anything) and that would be the following line of code:

new System.IO.FileInfo(filePath).Extension == ".xml"

Just replace the "filePath" with the path of your file and you're good to go. You can put the statement wherever a boolean is expected.

You can use it like this:

boolean isXmlFile = new FileInfo("c:\\config.xml").Extension == ".xml" //will return true
baltermia
  • 1,151
  • 1
  • 11
  • 26
  • The question is "to make sure it's a *valid* XML file". Therefore I think [this answer](https://stackoverflow.com/a/375734/861716) says it all. Note that many other extensions may be used for valid XML. – Gert Arnold Dec 09 '20 at 14:16
  • @GertArnold Op says _"In other words, my user can accidentally click on a different file in the file browser and trying to load, say, a .php file causes an exception."_ So from my understanding he just needs to check weather it's a .xml or .php file. Otherwise that statement would make not a lot of sense. – baltermia Dec 09 '20 at 14:35
  • Whatever, an extension check is insufficient. In the end, parsing the file is the only approach that covers it. – Gert Arnold Dec 09 '20 at 14:40