2

A surprisingly simple question this time! :-) There's an XML file like this:

<xml>
  <data> </data>
</xml>

Now I need to read exactly whatever is in the <data> element. Be it a single whitespace like U+0020. My naive guess:

XmlDocument xd = new XmlDocument();
xd.Load(fileName);
XmlNode xn = xd.DocumentElement.SelectSingleNode("data");
string data = xn.InnerText;

But that returns an empty string. The white space got lost. Any other data can be read just fine.

What do I need to do to get my space character here?

After browsing the web for a while, I tried reading the XML file with an XmlReader that lets me set XmlReaderSettings.IgnoreWhitespace = false but that didn't help.

ygoe
  • 18,655
  • 23
  • 113
  • 210
  • This might be a behaviour of the nonstandard `InnerText` property. Have you tried reading the child text nodes directly? – millimoose Feb 10 '13 at 23:35
  • I had not, but now I have, and there is no text node for a single whitespace in the document. The solution with xml:space below works well. – ygoe Feb 11 '13 at 07:26

1 Answers1

7

You must use xml:space="preserve" in your XML, according to the W3C standards and the MSDN docs.

The W3C standards dictate that white space be handled differently depending on where in the document it occurs, and depending on the setting of the xml:space attribute. If the characters occur within the mixed element content or inside the scope of the xml:space="preserve", they must be preserved and passed without modification to the application. Any other white space does not need to be preserved. The XmlTextReader only preserves white space that occurs within an xml:space="preserve" context.

        XmlDocument xd = new XmlDocument();
        xd.LoadXml(@"<xml xml:space=""preserve""><data> </data></xml>");
        XmlNode xn = xd.DocumentElement.SelectSingleNode("data");
        string data = xn.InnerText; // data == " "
        Console.WriteLine(data == " "); //True

Tested HERE.

CC Inc
  • 5,842
  • 3
  • 33
  • 64