0

I'm running into an issue with the HtmlUnit parser where I'm trying to grab some XML from a website (using the website's API) do a quick parse of the resulting XML and then save the XML to a file (all within the rights of the API). (sample content)

Unfortunately the website returns an entity ¿ in some of the requested pages, and while this is a valid HTML entity HtmlUnit is throwing an exception during the parse with message:

The entity "iquest" was referenced, but not declared.

How do I define iquest as a valid entity?

Mark Elliot
  • 75,278
  • 22
  • 140
  • 160

1 Answers1

1

You can't define ¿ except by editing the data you received (the data is not XML as any validator will show e.g. first one I found on google

The site is not serving valid XML so the best wayis ask it to fix the XML.

When that fails then either so a search and replace on ¿ or add a DOCTYPE that defines the entity &iquest.

mmmmmm
  • 32,227
  • 27
  • 88
  • 117
  • Fair enough. I'd love to be able to intercept the stream and use the HtmlUnit parser, instead I'm taking the content stream and parsing it outside of the HU framework with these invalid entities stripped. – Mark Elliot Jun 28 '10 at 18:56