3

I have an XML format with following format

<Tag>
    Value
</Tag>

This comes from an external datasource I cannot change. When using XmlReader the content has Linebreaks and Whitepace.

XmlReaderSettings xmlSettings = new XmlReaderSettings();
xmlSettings.Schemas = new System.Xml.Schema.XmlSchemaSet();
XmlReader schemaReader = XmlReader.Create(xsdStream);
xmlSettings.Schemas.Add("", schemaReader);
xmlSettings.ValidationType = ValidationType.Schema;
reader = XmlReader.Create(xmlFilename, xmlSettings);
// Parse the XML file.
while (reader.Read())
{
    if (reader.IsStartElement())
    {
         switch (reader.Name)
         {
             case "Tag":
                 string value = reader.ReadElementContentAsString();
                 Console.WriteLine(value);
                 break; 
          }
     }
}

How can I avoid this?

Razer
  • 7,843
  • 16
  • 55
  • 103
  • Any reason not to use `value.Trim()`? And do you really need to use `XmlReader` instead of LINQ to XML or similar? (Unless you're reading a huge document, it's *much* simpler to parse the whole thing into a DOM first.) – Jon Skeet May 04 '13 at 16:25

1 Answers1

3

Not working answer

This answer doesn't seem to work, but I'm leaving it for the moment to avoid anyone else suggesting it. I'll delete this if someone posts a better answer.

Did you try setting XmlReaderSettings.IgnoreWhitespace?

White space that is not considered to be significant includes spaces, tabs, and blank lines used to set apart the markup for greater readability. An example of this is white space in element content.

For some reason this doesn't affect ReadElementContentAsString or even the Value property of a text node.

Simple answer

You could just call Trim:

string value = reader.ReadElementContentAsString().Trim();

That won't remove line breaks between contentful lines, of course... if you need to do that, you could always use string.Replace.

(As I mentioned in the comment, I'd personally prefer using LINQ to XML than XmlReader unless you're genuinely reading something too large to fit in memory, but that's a separate matter.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • `XmlReaderSettings.IgnoreWhitespace` only affects text nodes that consist solely of whitespace. In this case `` contains the text node "\r\n····Value\r\n". – Michael Liu May 04 '13 at 16:35
  • Thank you for your reponse. I personally would also like to switch to LINQ, but my project needs to build with `.net 2.0`. – Razer May 04 '13 at 16:36
  • @Razer: You could still use `XmlDocument` which would be easier to use than `XmlReader` - at least, I certainly find it easier. – Jon Skeet May 04 '13 at 16:43