How Can I Preserve Character Entities In .Net XDocument?

Question

I'm porting a set of services to .Net 4.0 and have discovered (much to my dismay) that character entities I'm creating and storing in XElement.Value()'s are being "restored" to their original character values when I convert the XDocument object into an XML stream for the HTTP response.

The "escaped" characters need to appear in the XML document as character entities (e.g. ® and not ®) to remain compatible with legacy applications that were written to only allow character entities for non-Latin characters.

Is there a way (a different document type, or Encoding() method, or something else altogether) I can configure XDocument to preserve these character entities when I create my XML stream? Maybe there's an alternative to XDocument or XmlDocument that I can use instead?

If you want the text to be stored as ® and not ® you have to escape all reserved characters (such as &) using any of the mechanisms in XML available (character data, escape characters, etc). This means that you will not be storing the character ® in your XML document - you will be storing an ampersand, a hash sign, four digits and a semicolon. But if that's what you want, that's what you should do. :) If you just set `.Value` to `{` it should be stored verbatim, not parsed - are you sure you can reproduce that? — bzlm, Feb 02 '11 at 22:12
You would think so, wouldn't you. :-) It converts the ampersand to `&`, so you end up with weird gibberish in the output. Yeah, I could .Replace() them with a real ampersand before I stream out the response, but I was hoping I'd just missed a configuration flag, or setting, or reader / writer, or... — jerhewet, Feb 03 '11 at 16:13

score 0 · Answer 1 · answered Feb 02 '11 at 23:15

0

Have you tried creating an XmlWriter with the encoding set to latin-1 and then saving the XDocument using it? I haven't tried it, but it might coerce it to use unnecessary character entities.

And what kind of horrible software are you using if it doesn't even support Unicode?

answered Feb 02 '11 at 23:15

Matti Virkkunen

63,558
9
127
159

Tried a couple of different encodings, and I'm still at square one. The software is/are legacy applications written by consumers of the services, which expect anything above 127H to be escaped. – jerhewet Feb 03 '11 at 16:41
@jerhewet: Sounds pretty bad. Tell them your update is making proper Unicode support mandatory. – Matti Virkkunen Feb 03 '11 at 16:51

How Can I Preserve Character Entities In .Net XDocument?

1 Answers1