27

Does anyone know how I can check if a string contains well-formed XML without using something like XmlDocument.LoadXml() in a try/catch block? I've got input that may or may not be XML, and I want code that recognises that input may not be XML without relying on a try/catch, for both speed and on the general principle that non-exceptional circumstances shouldn't raise exceptions. I currently have code that does this;

private bool IsValidXML(string value)
    {
        try
        {
            // Check we actually have a value
            if (string.IsNullOrEmpty(value) == false)
            {
                // Try to load the value into a document
                XmlDocument xmlDoc = new XmlDocument();

                xmlDoc.LoadXml(value);

                // If we managed with no exception then this is valid XML!
                return true;
            }
            else
            {
                // A blank value is not valid xml
                return false;
            }
        }
        catch (System.Xml.XmlException)
        {
            return false;
        }
    }

But it seems like something that shouldn't require the try/catch. The exception is causing merry hell during debugging because every time I check a string the debugger will break here, 'helping' me with my pesky problem.

Steve Cooper
  • 20,542
  • 15
  • 71
  • 88
  • If the debugger is your problem you can just switch off user handling of XmlExceptions. Use the shortcut within VS: Ctrl + Alt + E, find System.Xml.XmlException and toggle it off. – bytedev Jan 31 '13 at 15:40
  • 3
    Amazing how every single answer is a try/catch answer; despite you explicitly pointing out, you are looking for a solution without try/catch. Try/catch is not an IF statement; it should not be part of the process. It's for handling EXCEPTIONS. Its kinda obvious from the name :) I hope you find a good answer one day. – Christian Jul 28 '20 at 10:36

11 Answers11

23

I don't know a way of validating without the exception, but you can change the debugger settings to only break for XmlException if it's unhandled - that should solve your immediate issues, even if the code is still inelegant.

To do this, go to Debug / Exceptions... / Common Language Runtime Exceptions and find System.Xml.XmlException, then make sure only "User-unhandled" is ticked (not Thrown).

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • +1 for this life-saving solution. I only enable breaking on handled exception if I have to debug the failing code. – OregonGhost Jun 22 '09 at 09:28
8

Steve,

We had an 3rd party that accidentally sometimes sent us JSON instead of XML. Here is what I implemented:

public static bool IsValidXml(string xmlString)
{
    Regex tagsWithData = new Regex("<\\w+>[^<]+</\\w+>");

    //Light checking
    if (string.IsNullOrEmpty(xmlString) || tagsWithData.IsMatch(xmlString) == false)
    {
        return false;
    }

    try
    {
        XmlDocument xmlDocument = new XmlDocument();
        xmlDocument.LoadXml(xmlString);
        return true;
    }
    catch (Exception e1)
    {
        return false;
    }
}

[TestMethod()]
public void TestValidXml()
{
    string xml = "<result>true</result>";
    Assert.IsTrue(Utility.IsValidXml(xml));
}

[TestMethod()]
public void TestIsNotValidXml()
{
    string json = "{ \"result\": \"true\" }";
    Assert.IsFalse(Utility.IsValidXml(json));
}
Greg Finzer
  • 6,714
  • 21
  • 80
  • 125
6

That's a reasonable way to do it, except that the IsNullOrEmpty is redundant (LoadXml can figure that out fine). If you do keep IsNullOrEmpty, do if(!string.IsNullOrEmpty(value)).

Basically, though, your debugger is the problem, not the code.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • I've come to agree. I've marked up the method with a debugger attribute [DebuggerStepThrough] which stops the debugger stopping on the exception. – Steve Cooper Sep 22 '10 at 09:47
  • The IsNullOrEmpty is just an optimization to avoid the overhead of an exception when you call IsValidXml("") -- which happens a great deal in my program. – Steve Cooper Sep 22 '10 at 09:48
4

Add the [System.Diagnostics.DebuggerStepThrough] attribute to the IsValidXml method. This suppresses the XmlException from being caught by the debugger, which means you can turn on the catching of first-change exceptions and this particular method will not be debugged.

Steve Cooper
  • 20,542
  • 15
  • 71
  • 88
2

Caution with using XmlDocument for it possible to load an element along the lines of <0>some text</0> using XmlDocument doc = (XmlDocument)JsonConvert.DeserializeXmlNode(object) without an exception being thrown.

Numeric element names are not valid xml, and in my case an error did not occur until I tried to write the xmlDoc.innerText to an Sql server datatype of xml.

This how I validate now, and an exception gets thrown
XmlDocument tempDoc = XmlDocument)JsonConvert.DeserializeXmlNode(formData.ToString(), "data"); doc.LoadXml(tempDoc.InnerXml);

golfalot
  • 956
  • 12
  • 22
  • 1
    Good point - the xml standard says 'latter followed by zero or more name characters -- [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* – Steve Cooper Nov 09 '15 at 07:32
1

The XmlTextReader class is an implementation of XmlReader, and provides a fast, performant parser. It enforces the rules that XML must be well-formed. It is neither a validating nor a non-validating parser since it does not have DTD or schema information. It can read text in blocks, or read characters from a stream.

And an example from another MSDN article to which I have added code to read the whole contents of the XML stream.

string str = "<ROOT>AQID</ROOT>";
XmlTextReader r = new XmlTextReader(new StringReader(str));
try
{
  while (r.Read())
  {
  }
}
finally
{
  r.Close();
}

source: http://bytes.com/topic/c-sharp/answers/261090-check-wellformedness-xml

dandan78
  • 13,328
  • 13
  • 64
  • 78
Shivanath D
  • 121
  • 1
  • 7
0

I disagree that the problem is the debugger. In general, for non-exceptional cases, exceptions should be avoided. This means that if someone is looking for a method like IsWellFormed() which returns true/false based on whether the input is well formed XML or not, exceptions should not be thrown within this implementation, regardless of whether they are caught and handled or not.

Exceptions are expensive and they should not be encountered during normal successful execution. An example is writing a method which checks for the existance of a file and using File.Open and catching the exception in the case the file doesn't exist. This would be a poor implementation. Instead File.Exists() should be used (and hopefully the implementation of that does not simply put a try/catch around some method which throws an exception if the file doesn't exist, I'm sure it doesn't).

Lipis
  • 21,388
  • 20
  • 94
  • 121
nickdu
  • 31
  • 1
  • I'm not sure this answer is helpful. You haven't supplied an alternative way to check well-formed-ness that doesn't throw an exception. It seems like a statement about your thoughts on methods that throw exceptions. – Duncan Jones Oct 23 '12 at 17:54
  • Steve explicitly asks for a way to do this without try-catch, so telling him he should do it without try-catch really isn't helpful and borders on sarcasm. – Paul Groke Oct 23 '12 at 17:57
  • I wasn't attempting to be sarcastic and I know I wasn't answering the question. I figured that was obvious. I was commenting on other comments. I guess I should have added my reply as a comment to the reply I was commenting on. – nickdu Dec 31 '12 at 14:41
0

Just my 2 cents - there are various questions about this around, and most people agree on the "garbage in - garbage out" fact. I don't disagree with that - but personally I found the following quick and dirty solution, especially for the cases where you deal with xml data from 3rd parties which simply do not communicate with you easily.. It doesn't avoid using try/catch - but it uses it with finer granularity, so in cases where the quantity of invalid xml characters is not that big, it helps.. I used XmlTextReader, and its method ReadChars() for each parent element, which is one of the commands that do not do well-formed checks, like ReadInner/OuterXml does. So it's a combination of Read() and ReadChars() when Read() stubmbles upon a parent node. Of course this works because I can do assumption that the basic structure of the XML is okay, but contents (values) of certain nodes can contain special characters that haven't been replaced with &..; equivalent... (I found an article about this somewhere, but can't find the source link at the moment)

hello_earth
  • 1,442
  • 1
  • 25
  • 39
0

I'm using this function for verifying strings/fragments

<Runtime.CompilerServices.Extension()>
Public Function IsValidXMLFragment(ByVal xmlFragment As String, Optional Strict As Boolean = False) As Boolean
    IsValidXMLFragment = True

    Dim NameTable As New Xml.NameTable

    Dim XmlNamespaceManager As New Xml.XmlNamespaceManager(NameTable)
    XmlNamespaceManager.AddNamespace("xsd", "http://www.w3.org/2001/XMLSchema")
    XmlNamespaceManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance")

    Dim XmlParserContext As New Xml.XmlParserContext(Nothing, XmlNamespaceManager, Nothing, Xml.XmlSpace.None)

    Dim XmlReaderSettings As New Xml.XmlReaderSettings
    XmlReaderSettings.ConformanceLevel = Xml.ConformanceLevel.Fragment
    XmlReaderSettings.ValidationType = Xml.ValidationType.Schema
    If Strict Then
        XmlReaderSettings.ValidationFlags = (XmlReaderSettings.ValidationFlags Or XmlSchemaValidationFlags.ProcessInlineSchema)
        XmlReaderSettings.ValidationFlags = (XmlReaderSettings.ValidationFlags Or XmlSchemaValidationFlags.ReportValidationWarnings)
    Else
        XmlReaderSettings.ValidationFlags = XmlSchemaValidationFlags.None
        XmlReaderSettings.ValidationFlags = (XmlReaderSettings.ValidationFlags Or XmlSchemaValidationFlags.AllowXmlAttributes)
    End If

    AddHandler XmlReaderSettings.ValidationEventHandler, Sub() IsValidXMLFragment = False
    AddHandler XmlReaderSettings.ValidationEventHandler, AddressOf XMLValidationCallBack

    Dim XmlReader As Xml.XmlReader = Xml.XmlReader.Create(New IO.StringReader(xmlFragment), XmlReaderSettings, XmlParserContext)
    While XmlReader.Read
        'Read entire XML
    End While
End Function

I'm using this function for verifying files:

Public Function IsValidXMLDocument(ByVal Path As String, Optional Strict As Boolean = False) As Boolean
    IsValidXMLDocument = IO.File.Exists(Path)
    If Not IsValidXMLDocument Then Exit Function

    Dim XmlReaderSettings As New Xml.XmlReaderSettings
    XmlReaderSettings.ConformanceLevel = Xml.ConformanceLevel.Document
    XmlReaderSettings.ValidationType = Xml.ValidationType.Schema
    If Strict Then
        XmlReaderSettings.ValidationFlags = (XmlReaderSettings.ValidationFlags Or XmlSchemaValidationFlags.ProcessInlineSchema)
        XmlReaderSettings.ValidationFlags = (XmlReaderSettings.ValidationFlags Or XmlSchemaValidationFlags.ReportValidationWarnings)
    Else
        XmlReaderSettings.ValidationFlags = XmlSchemaValidationFlags.None
        XmlReaderSettings.ValidationFlags = (XmlReaderSettings.ValidationFlags Or XmlSchemaValidationFlags.AllowXmlAttributes)
    End If
    XmlReaderSettings.CloseInput = True

    AddHandler XmlReaderSettings.ValidationEventHandler, Sub() IsValidXMLDocument = False
    AddHandler XmlReaderSettings.ValidationEventHandler, AddressOf XMLValidationCallBack

    Using FileStream As New IO.FileStream(Path, IO.FileMode.Open)
        Using XmlReader As Xml.XmlReader = Xml.XmlReader.Create(FileStream, XmlReaderSettings)
            While XmlReader.Read
                'Read entire XML
            End While
        End Using
    End Using
End Function
VoteCoffee
  • 4,692
  • 1
  • 41
  • 44
0

In addition, when only verifying syntactic correctness of the XML string (when there is no need to resolve an external schema), I think adding a XmlResolver = null setting may be a good idea. This both ensures security (no Web access) and security (avoid malicious XML content directing the code to access bad sites). Code follows (requires C# 2.0 or higher):

public static bool IsValidXml(string candidateString)
{
    try
    {
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.XmlResolver = null;
        XmlDocument document = new XmlDocument();
        document.XmlResolver = null;
        document.Load(XmlReader.Create(new MemoryStream(Encoding.UTF8.GetBytes(candidateString)), settings));
        return true;
    }
    catch (XmlException)
    {
        return false;
    }
}

An optimized version for C# 6.0 or higher:

public static bool IsValidXml(string candidateString)
{
    try
    {
        var settings = new XmlReaderSettings { XmlResolver = null };
        var document = new XmlDocument() { XmlResolver = null };
        document.Load(XmlReader.Create(new MemoryStream(Encoding.UTF8.GetBytes(candidateString)), settings));
        return true;
    }
    catch (XmlException)
    {
        return false;
    }
}
robbie fan
  • 618
  • 1
  • 6
  • 10
-2

My two cents. This was pretty simple and follows some common conventions since it's about parsing...

public bool TryParse(string s, ref XmlDocument result)
{
    try {
        result = new XmlDocument();
        result.LoadXml(s);
        return true;
    } catch (XmlException ex) {
        return false;
    }
}
toddmo
  • 20,682
  • 14
  • 97
  • 107