0

I've got XML to parse that mostly looks like XHTML, but according to the docs should be XML, looking like XHTML.

So I resort to AngleSharp.XML to parse it. But I already fail to parse the simplest input:

<p>Ma&szlig;nahmen</p>

This is the code I use for parsing:

var config = Configuration.Default.WithDefaultLoader(new LoaderOptions 
    { 
        IsResourceLoadingEnabled = true 
    }).WithCss().WithXml();

var context = BrowsingContext.New(config);

var xml = @"<xml><p>Ma&szlig;nahmen</p></xml>";

var xmlParser = new XmlParser(new XmlParserOptions(), context);
var xmlDoc = xmlParser.ParseDocument(xml);

And this is the resulting error:

Message:

Test method TestProject1.UnitTest1.TestParseEntity threw exception: AngleSharp.Xml.Parser.XmlParseException: Error while parsing the provided XML document.

Stack Trace: 

XmlTokenizer.CharacterReference()
XmlTokenizer.DataText(Char c)
XmlTokenizer.Data(Char c)
XmlTokenizer.Get()
XmlDomBuilder.Parse(XmlParserOptions options)
XmlParser.Parse(XmlDocument document)
XmlParser.ParseDocument(String source)

What's wrong with my configuration? How can it properly detect the &szlig;? Do I need to somehow add DTD references? Do those get resolved automatically or do I have to implement this (like here)?

spaleet
  • 838
  • 2
  • 10
  • 23
Heinrich Ulbricht
  • 10,064
  • 4
  • 54
  • 85

1 Answers1

0

After finding this ticket and reading through the code the answer seems quite simple. The configuration can be filled with an HtmlEntityResolver which will take over the job and is much more potent than the XmlEntityResolver which was used before:

var config = Configuration.Default.With(HtmlEntityProvider.Resolver).WithDefaultLoader(new LoaderOptions { IsResourceLoadingEnabled = true}).WithCss().WithXml();
Heinrich Ulbricht
  • 10,064
  • 4
  • 54
  • 85