3

Question
Should whitespace be ignored at the beginning of my multi-line string literal xml?

Code

string XML = @"
            <?xml version=""1.0"" encoding=""utf-8"" ?>"

using (StringReader stringReader = new StringReader(XML))
using (XmlReader xmlReader = XmlReader.Create(stringReader,
    new XmlReaderSettings() { IgnoreWhitespace = true }))
            {
                xmlReader.MoveToContent();
                // further implementation withheld
            }

Notice in the above code that there is white space before the XML declaration, this doesn't seem to be being ignored despite my setting of the IgnoreWhiteSpace property. Where am I going wrong?!

Note: I have the same behaviour when the XML string does not have a line break, and just a whitespace, as below. I know this will run if I remove the whitespace, my question is as to why the property doesn't take care of this?

string XML = @" <?xml version=""1.0"" encoding=""utf-8"" ?>"
George Grainger
  • 172
  • 2
  • 15
  • 1
    do this using (StringReader stringReader = new StringReader(XML.Trim())) to remove white spaces – rashfmnb Dec 15 '16 at 15:26
  • The [IgnoreWhiteSpace property](https://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings.ignorewhitespace(v=vs.110).aspx) setting does not affect white space between markup in a mixed content mode, or white space that occurs within the scope of an xml:space='preserve' attribute. – RamblinRose Dec 15 '16 at 15:28
  • @RamblinRose Thanks, I did see that on MSDN. What is "mixed content mode"? – George Grainger Dec 15 '16 at 15:29
  • `XML = XML.Trim();` should fix your problem – M.kazem Akhgary Dec 15 '16 at 15:30
  • @GeorgeGrainger I believe an example would be XHTML where elements and text are interspersed, here's a decent [description](http://docstore.mik.ua/orelly/xml/schema/ch14_02.htm). – RamblinRose Dec 15 '16 at 15:40

2 Answers2

0

The documentations say that the IgnoreWhitespace property will "Gets or sets a value indicating whether to ignore insignificant white space.". While that first whitespace (and also linebreak) should be insignificant, the one who made XmlReader apparently didn't think so. Just trim XML before use, and you'll be fine.

As stated in comments and for clarity, change your code to:

string XML = @"<?xml version=""1.0"" encoding=""utf-8"" ?>"

using (StringReader stringReader = new StringReader(XML.Trim()))
using (XmlReader xmlReader = XmlReader.Create(stringReader,
new XmlReaderSettings() { IgnoreWhitespace = true }))
        {
            xmlReader.MoveToContent();
            // further implementation withheld
        }
K Ekegren
  • 218
  • 1
  • 6
  • Thank you, I do know how to correct it so it runs. What I was looking for were the boundaries of the IgnoreWhiteSpace property, and why in this instance it doesn't work. – George Grainger Dec 15 '16 at 15:54
0

According to Microsoft's documentation regarding XML Declaration

The XML declaration typically appears as the first line in an XML document. The XML declaration is not required, however, if used it must be the first line in the document and no other content or white space can precede it.

The parse should fail for your code because white space precedes the XML declaration. Removing either the white space OR the xml declaration will result in a successful parse.

In other words it would be a bug if XmlReaderSettings were at odds with the documentation for XML Declaration - it is defined behavior.

Here's some code demonstrating the above rules.

using System;
using System.Web;
using System.Xml;
using System.Xml.Linq;

public class Program
{
    public static void Main()
    {
        //The XML declaration is not required, however, if used it must 
        // be the first line in the document and no other content or 
        //white space can precede it.

        // here, no problem because this does not have an XML declaration
            string xml = @"                                                               
                         <xml></xml>";
            XDocument doc = XDocument.Parse(xml);
            Console.WriteLine(doc.Document.Declaration);
            Console.WriteLine(doc.Document);
        //
        // problem here because this does have an XML declaration
        //
        xml = @"                                      
        <?xml version=""1.0"" encoding=""utf-8"" ?><xml></xml>";
        try 
        {
        doc = XDocument.Parse(xml);
            Console.WriteLine(doc.Document.Declaration);
            Console.WriteLine(doc.Document);
        } catch(Exception e) {
            Console.WriteLine(e.Message);
        }

    }
}
RamblinRose
  • 4,883
  • 2
  • 21
  • 33