1

In C# I need to catch an XMLException but I also have to differentiate it, because it can be either Xml_InvalidRootData or Xml_UnexpectedEOF.

How can I achieve this?

Those strings I can only see in debugger with an alias of "ResString".

But I want to have multi-culture solution, so string comparison is something I want to avoid as much as possible.

HResults are the same.

Daniel
  • 2,318
  • 2
  • 22
  • 53
  • Do those differentiations have different HResults? You could check for that. – Tom Aug 15 '19 at 21:19
  • @RufusL: yes, different actions are needed. – Daniel Aug 15 '19 at 21:25
  • @Tom: same HResults. – Daniel Aug 15 '19 at 21:25
  • @RufusL: yes, it is always 1, but for EOF is also can be 1. XML can be formatted in 1-line way. – Daniel Aug 15 '19 at 21:33
  • Good point. Perhaps you could do some validation on the xml file yourself in the `catch` block to try to determine the problem, but there aren't any built-in properties for that. In the end, it's just an instance of the [`XmlException`](https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmlexception?view=netframework-4.8) class. If the `HResult` and `InnerException` properties are the same for each, probably the `Message` is the best bet. But that could be localized, so that could be problematic. – Rufus L Aug 15 '19 at 21:40
  • Also, there are other types of XMLExceptions as well, like "unexpected token", "root element is missing", etc. – Rufus L Aug 15 '19 at 21:45
  • @RufusL: what are the classnames of those exceptions? Thanks! – Daniel Aug 15 '19 at 21:58
  • They are all `XMLException` types, just different messages (I saw them when searching online for how to differentiate xml exception types). Probably if you have some way of coping with the different exceptions, you should just try to fix the xml in any way you can, then retry, and if the exception still happens then either let it bubble up or log it or whatever. – Rufus L Aug 15 '19 at 21:59
  • @Daniel - In the comments on Eric Lippert's answer, I linked to an XML library that uses red/green nodes like Roslyn. I've never used it so I can't vouch for it but it *might* be able to represent an erroneous structure in the AST and let you get the exact kind of malformation so you can communicate it to the user. See if that helps. – madreflection Aug 15 '19 at 22:53

1 Answers1

1

If you take a look at

https://referencesource.microsoft.com/#System.Runtime.Serialization/System/Xml/XmlExceptionHelper.cs

you'll see that throughout, there is a lot of work done to get a (possibly localized) error string, which is then the only argument to new XmlException.

As you correctly note, if you need to distinguish between different exception conditions to make some programmatic response, this is a whole lot of no help.

Since you do not want to examine the strings -- and that is a reasonable choice -- your best bet is probably to write your own XML parser that has the output you desire.

Consider the design of such a parser carefully. The output that you want is not the structured XML, but rather a detailed report explaining why it is not legal XML. Exceptions are a mechanism for handling exceptional situations; the designers of the XML parser considered malformed XML to be an exceptional situation; they thought this scenario should almost never happen. Since it almost never happens, and since when it does happen, there's nothing the program can do about it, there is no incentive to produce a detailed report that allows programmatic decisions to be made on the basis of what errors were detected.

But that is apparently not your situation; you have the opposite situation of the designers of the XML parser. You care about the error, and you wish to do something different depending on different errors, so the output of your parser should be the error report, not the XML syntax tree. It should not throw exceptions at all, because in your scenario, a malformed XML document is not exceptional; you expect it.

XML is not a particularly difficult language to lex and parse (provided you are not also trying to solve the problem of "is this document a valid instance of this schema?", which is a harder problem) so it should not take you long to produce an error-detecting lexer and parser, particularly since you have the source code of existing XML parsers to guide you. Good luck!

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Do you know if there was a point before .NET 1.0 was released where they had to cut that level of detail. I started by wondering, "why don't they have it?" and of course, you usually say it's about the cost/benefit analysis. So now I wonder if the benefit was ever there at all to be weighed against a cost. – madreflection Aug 15 '19 at 22:07
  • Not sure if this is helpful, but those strings are resources found in a sealed class called **Res** which is embedded in the .NET Framework. On github, you can find the decompiled .NET for those strings: https://github.com/rashiph/DecompliedDotNetLibraries/blob/master/System.Xml/System/Xml/Res.cs – lunarquaker Aug 15 '19 at 22:30
  • @madreflection: I was not involved with the design of the library back in the early days, so my musings about what the designers were thinking is educated conjecture. But any time you're wondering about a design decision, start by thinking about the 99.9% case. The vast majority of developers who are parsing an XML document have a valid document and they want to know the value of some specific attribute in some specific tag because they are solving a business problem... – Eric Lippert Aug 15 '19 at 22:34
  • Interesting insight. I figured you weren't, but I was hoping maybe you picked up something while on the compiler team. Holding out for a 0.1% chance. :) – madreflection Aug 15 '19 at 22:36
  • @madreflection: ... I, on the other hand, am constantly frustrated by the design of almost all tag language parsers because I am using them *as part of a developer tool in an IDE* that needs to be able to handle malformed tags; in an IDE, the code is malformed most of the time because the user is typing it in; if it were correct, they wouldn't be typing! I care not a bit about the business case of the XML; I care about it at the character/token/syntax node level, and that information is usually not present in a useful form. And therefore I usually end up writing my own lexer and parser. – Eric Lippert Aug 15 '19 at 22:36
  • 1
    @EricLippert: I remember hearing about an XML parsing library that uses red/green trees like Roslyn. That sounded promising. I'll have to find it again. That might actually help the OP with this problem! – madreflection Aug 15 '19 at 22:37
  • I think this is it: [repo](https://github.com/KirillOsenkov/XmlParser) and [nuget.org](https://www.nuget.org/packages/GuiLabs.Language.Xml). Never had occasion to use it myself, which is why I had to find it again. – madreflection Aug 15 '19 at 22:43