0

I'm trying to load a chunk of HTML into MSXML's DOMDocument. The said chunk is valid XML with one excepton - it has   entities. MSXML chokes on them, claims "Reference to undefined entity 'nbsp'.".

Can I make MSXML recognize it as valid somehow?

Seva Alekseyev
  • 59,826
  • 25
  • 160
  • 281

1 Answers1

1

Simple solution: Just run a text replacement of " " to " " before parsing the document. Which should work, since there cannot be a verbatim   in the text, which should not be replaced.

More standard solution: Declare a nbsp; entity in the xml, by inserting

<!DOCTYPE foobar [
   <!ENTITY nbsp " " >
]>

before the xml root node.

You can also use "0xA0" and &#x00A0; if you actually want a non-breaking space, instead of a normal space

BeniBela
  • 16,412
  • 4
  • 45
  • 52
  • Ended up with the first option. Oh well. I was hoping it was possible to load those files without modifying them - guess not. – Seva Alekseyev Mar 01 '13 at 15:29