0

I am trying to figure out a way to parse an xml tag where content is passed in with CDATA tags for some input, but not for all.

For example, the following is sample content I would receive for data which contains CDATA tags. But there is some other scenarios where the CDATA tags are ommited.

<Data><![CDATA[ <h1>CHAPTER 2<br/> EDUCATION</h1>
                <P>  Analysis paragraph  </P> ]]></Data>

Is there an elegant way to somehow detect that, and implement ReadXml method that can parse both types of input (with or without CDATA)? So far my ReadXml() implementation is as follows, but am getting errors parsing when CDATA tag is omitted.

    public void ReadXml(XmlReader reader)
    {
        bool isEmpty = reader.IsEmptyElement;
        reader.ReadStartElement();
        if (isEmpty)
        {
            _data = string.Empty;
        }
        else
        {                
            switch (reader.MoveToContent())
            {
                case XmlNodeType.Text:
                case XmlNodeType.CDATA:
                    _data = reader.ReadContentAsString();
                    break;
                default:
                    _data = string.Empty;
                    break;
            }
            reader.ReadEndElement();
        }                         
    }
jvtech
  • 1,369
  • 3
  • 9
  • 10
  • Could you provide an example of failing code? If I pass in nocdata to your ReadXml function it works just fine. – Mikael Svenson Feb 16 '10 at 18:43
  • Its failing for me when the tag does not have cdata surrouding tags.. Sure it worked for you after you removed the CDATA tag from the sample i have above? I am getting an error when performing reader.ReadEndElement()... – jvtech Feb 16 '10 at 19:03
  • I tested on a simpler one. Check my answer for code sample. – Mikael Svenson Feb 16 '10 at 20:27

1 Answers1

1

The code below is tested on the following samples:

<Data><h1>CHAPTER 2<br/> EDUCATION</h1><P>  Analysis paragraph  </P></Data>
<Data>test<h1>CHAPTER 2<br/> EDUCATION</h1><P>  Analysis paragraph  </P></Data>
<Data><![CDATA[ <h1>CHAPTER 2<br/> EDUCATION</h1><P>  Analysis paragraph  </P> ]]></Data>
<Data></Data>

I use an XPathNavigator instead as it allows backtracking.

public void ReadXml(XmlReader reader)
{
    XmlDocument doc = new XmlDocument {PreserveWhitespace = false};
    doc.Load(reader);

    var navigator = doc.CreateNavigator();
    navigator.MoveToChild(XPathNodeType.Element);
    _data = navigator.InnerXml.Trim().StartsWith("&lt;") ? navigator.Value : navigator.InnerXml;
}
Mikael Svenson
  • 39,181
  • 7
  • 73
  • 79
  • That does do the trick. I ended up using Xnode instead of XmlDocument, and then its createNavigator method to get an XPathNavigator to use to retreive the innerxml. – jvtech Feb 16 '10 at 21:20
  • Using an XmlNode is probably better, and glad it worked. Feel free to mark the answer as accepted as well :) – Mikael Svenson Feb 16 '10 at 21:30
  • Using XmlDocument.Load and then getting XmlNode does not work for me. The sample xml in the example I gave is just one of the nodes in the actual input data (actual input has a quite complex xml structure). So If I try to do XmlDocument.Load when parsing this particular node, I am getting errors, and cannot read further. – jvtech Feb 16 '10 at 21:41