1

I get an expat error when parsing specific characters only. Other HTML code is parsed just fine. I'm using the UTF-8 library of expat libexpatMT.lib and I'm working with char and std::string in a wrapper. No wide chars etc. used.

// The ampersand leads to: Expat error: *not well-formed (invalid token)*
<a href="http://www.myurl.com?a=b&c=d">Link</a>
<span>Tom & Jerry</span>
<h1>K&auml;se</h1>

I'm confused why the ampersand can be an invalid token here, since it's used even within HTML entities like &amp; Replacing the ampersands with &amp; or custom spacers doesn't work either.

Any suggestions? The ampersand is the issue here.

skaffman
  • 398,947
  • 96
  • 818
  • 769
Smamatti
  • 3,901
  • 3
  • 32
  • 43

1 Answers1

3

In XML, you escape ampersand, even in entities. So the valid value is <a href="http://www.myurl.com?a=b&amp;c=d">Link</a>
Correct Web pages do that. Browsers are quite tolerant for the error you made, though.

PhiLho
  • 40,535
  • 6
  • 96
  • 134
  • This does not worke. It seems like I have to 'double escape' the values like: `&auml;` for text like `
    Tom&Jerry
    ` Thanks!
    – Smamatti Jul 28 '11 at 18:58
  • There is something wrong there, check if there isn't a double conversion/unescaping occurring on your code, perhaps. – PhiLho Jul 29 '11 at 05:35