I've got XML to parse that mostly looks like XHTML, but according to the docs should be XML, looking like XHTML.
So I resort to AngleSharp.XML to parse it. But I already fail to parse the simplest input:
<p>Maßnahmen</p>
This is the code I use for parsing:
var config = Configuration.Default.WithDefaultLoader(new LoaderOptions
{
IsResourceLoadingEnabled = true
}).WithCss().WithXml();
var context = BrowsingContext.New(config);
var xml = @"<xml><p>Maßnahmen</p></xml>";
var xmlParser = new XmlParser(new XmlParserOptions(), context);
var xmlDoc = xmlParser.ParseDocument(xml);
And this is the resulting error:
Message:
Test method TestProject1.UnitTest1.TestParseEntity threw exception: AngleSharp.Xml.Parser.XmlParseException: Error while parsing the provided XML document.
Stack Trace:
XmlTokenizer.CharacterReference()
XmlTokenizer.DataText(Char c)
XmlTokenizer.Data(Char c)
XmlTokenizer.Get()
XmlDomBuilder.Parse(XmlParserOptions options)
XmlParser.Parse(XmlDocument document)
XmlParser.ParseDocument(String source)
What's wrong with my configuration? How can it properly detect the ß
? Do I need to somehow add DTD references? Do those get resolved automatically or do I have to implement this (like here)?