0

I have an XML string that contains an apostrophe. I replace the apostrophe with its equivalent & parse the revised string into an XElement. The XElement, however, is turning the ' back into an apostrophe.

How do I force XElement.Parse to preserve the encoded string?

string originalXML = @"<Description><data>Mark's Data</data></Description>"; //for illustration purposes only
string encodedApostrophe = originalXML.Replace("'", "&#39;");
XElement xe = XElement.Parse(encodedApostrophe);
Mark Maslar
  • 1,121
  • 4
  • 16
  • 28
  • 1
    Why do you need it to? They're equivalent in XML. – Jon Skeet Oct 29 '11 at 18:13
  • I had up to now exactly one case where the above "encoding" was explicitly required by a client when writing the XML... but never when reading... if you want to process it in .NET there is no difference between the two (thus no need to preserve) and if you need to process it outside .NET and it is absolutely necessary then that would entail to write it the same way you wrote it the first time... – Yahia Oct 29 '11 at 18:20
  • RE: Why do I need to? Downstream, the xml gets embedded in some dynamically-generated JavaScript. The embedded apostrophe breaks JavaScript string. – Mark Maslar Oct 29 '11 at 18:30
  • Can't you fix the JavaScript then? That looks like a proper solution. – svick Oct 29 '11 at 18:48
  • RE: Can't you fix the JavaScript. It looks something like this (EXTREMELY simplified): var myString = "the XElelent.value and some other dynamic values marked with apostrophes. Having a stray apostrophe in the XElement.value causes JavaScript error. "; – Mark Maslar Oct 29 '11 at 18:58

1 Answers1

1

This is correct behavior. In places where ' is allowed, it works the same as &apos;, &#39; or &#x27;. If you want to include literal string &#39; in the XML, you should encode the &:

originalXML.Replace("'", "&amp;#39;")

Or parse the original XML and modify that:

XElement xe = XElement.Parse(originalXML);

var data = xe.Element("data");

data.Value = data.Value.Replace("'", "&#39;");

But doing this seems really weird. Maybe there is a better solution to the problem you're trying to solve.

Also, this encoding is not “ASCII equivalent”, they are called character entity references. And the numeric ones are based on the Unicode codepoint of the character.

svick
  • 236,525
  • 50
  • 385
  • 514
  • Thx! I've edited the question to reflect the correct nomencalture: "character entity references" – Mark Maslar Oct 29 '11 at 18:43
  • RE: "Maybe there is a better solution..." I'm open to suggestion! The source XML string may contain an apostrophe. It needs to get stored in an XElement, and *must not* contain an apostrophe there. Additionally, the values within the XElement really ought to be HTML encoded too. – Mark Maslar Oct 29 '11 at 18:49