1

I have a string containing xml e.g.:

<Person Name="Molly O&apos;Mahony" />

I want to convert this to an XElement, maintaining the & apos ; and not converting it to '

using

XElement.Parse(string);

creates the element <Person Name="Molly O'Mahony" /> Is this possible?

binncheol
  • 1,393
  • 4
  • 22
  • 38
  • What makes you think it will be converted? What code do you have now? – DavidG Dec 13 '17 at 16:24
  • using XElement.Parse(string) converts it – binncheol Dec 13 '17 at 16:26
  • @dbc OP means value in attribute. His sample is not valid xml anyway, but he means something like `` – Evk Dec 13 '17 at 16:27
  • Sorry, I fixed the (bad) example I had posted above. – binncheol Dec 13 '17 at 16:32
  • 1
    Why do you care about the encoding anyway? It's the same thing in XML. – DavidG Dec 13 '17 at 16:35
  • 1
    @DavidG because there will be a CRC check at some point – binncheol Dec 13 '17 at 16:37
  • Then you can't parse it into an `XElement` unless you manually replace all apostrophes with the encoded value when you export back to string. – DavidG Dec 13 '17 at 16:38
  • Entity expansion is done at the low level by `XmlReader`. As shown [here](https://stackoverflow.com/a/33255946/3744182) and [here](https://stackoverflow.com/a/42765970/3744182) if you switch to `XmlTextReader` you can disable expansion of *general* entities by setting `EntityHandling = EntityHandling.ExpandCharEntities` but there isn't a way to disable expansion of character entities. The suggestion is to create your own subclass of `XmlReader`. (And of course you would need to subclass `XmlWriter` to avoid escaping then when serializing.) – dbc Dec 13 '17 at 17:38
  • But `&apos` is a [predefined XML entity](https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML). *Every* XML reader in *every* framework is going to convert it to an apostrophe because the XML spec *requires that all XML processors honor them.* Are you really sure this will cause problems down the road? – dbc Dec 13 '17 at 17:41
  • If you are going to have CRC check, I suppose it will be on whole xml document, not on separate attributes? – Evk Dec 13 '17 at 17:47

1 Answers1

0

Here we go:

var txt = "<Person Name=\"Molly O&apos;Mahony\" />";

var e  = XElement.Parse(txt, LoadOptions.SetLineInfo);
var li = (IXmlLineInfo) e.Attribute("Name");

// li.LinePosition points to first char of the attribute: Name="Molly O&apos;Mahony"
//                                                        ^

var start = txt.IndexOf('"', li.LinePosition) + 1;  
var end   = txt.IndexOf('"', start);
var len   = end - start;

var attr  = txt.Substring(start, len); // Molly O&apos;Mahony

You might need to adjust the code to make it work with multi-line text.

3dGrabber
  • 4,710
  • 1
  • 34
  • 42