The system I'm working on uses DataSet.ReadXml(XmlReader)
to read an XML file and load its contents to a DataSet
. The XML file is from a business partner and may not always be well-formed, but this system is expected to perform reasonable corrections to the input.
We've seen errors in the XML input files, such as:
- Case 1: In the middle of a string value, use of characters such as
'<'
,'>'
, or my favorite,'&'
, which causes "An error occurred while parsing EntityName. Line x, position y." - Case 2: In the middle of a string value, weird constructs such as
"<3"
so that the text depicts a heart, which causes "Name cannot begin with the '3' character. Line x, position y." - Case 3: Invalid characters for the given encoding, which causes "Invalid character in the given encoding. Line x, position y."
If some simple rules are adopted, these errors can be addressed programmatically:
- Case 1: Replace the offending character with its XML character entity
(
"&"
becomes"&"
, etc. - Case 2: Replace the
"<"
in"<3"
with a space, so that it becomes" 3"
- Case 3: Replace the invalid character with a space
However, all of these errors raise the same exception: System.Xml.XmlException
I would like to take an appropriate action when any of these errors are encountered, but what's the best way to do that? These three different errors all have the same HRESULT
value (-2146232000), and so far the only way I have been able to differentiate amongst them is by inspection of the XmlException.Message
string property.
String comparison seems a lousy way to determine the exact cause of the error. Were I to follow that approach, the code would break should the exception message change in future versions of .NET. It would also not be portable to some languages.
Therefore, how does one programmatically differentiate between the various types of errors that could be represented in an XmlException
?
EDIT
In the comments below I've received advice about the importance of ensuring that XML data is of high quality. I don't disagree, but as my question states, it's outside my control and I can do nothing about it. So, as well-intentioned as your remarks are, they miss the point. If you know a good way to differentiate amongst the very many errors that can be presented by the System.Xml.XmlException
class, please, share your knowledge. Thank you.