4

I'm using an XmlReader to iterate through some XML. Some of the XML is actually HTML and I want to get the text content from the node.

Example XML:

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <p>Here is some <b>data</b></p>
</data>

Example code:

using (XmlReader reader = new XmlReader(myUrl))
{
  while (reader.Read()) 
  {
    if (reader.Name == "p")
    { 
      // I want to get all the TEXT contents from the this node
      myVar = reader.Value;
    }
  }
}

This doesn't get me all the contents. How do I get all the contents from the

node in that situation?

ryanlifferth
  • 125
  • 1
  • 1
  • 6

3 Answers3

13

Use ReadInnerXml:

        StringReader myUrl = new StringReader(@"<?xml version=""1.0"" encoding=""UTF-8""?>
<data>
  <p>Here is some <b>data</b></p>
</data>");
        using (XmlReader reader = XmlReader.Create(myUrl))
        {
            while (reader.Read())
            {
                if (reader.Name == "p")
                {
                    // I want to get all the TEXT contents from the this node
                    Console.WriteLine(reader.ReadInnerXml());
                }
            }
        }

Or if you want to skip the <b> as well, you can use an aux reader for the subtree, and only read the text nodes:

        StringReader myUrl = new StringReader(@"<?xml version=""1.0"" encoding=""UTF-8""?>
<data>
  <p>Here is some <b>data</b></p>
</data>");
        StringBuilder myVar = new StringBuilder();
        using (XmlReader reader = XmlReader.Create(myUrl))
        {
            while (reader.Read())
            {
                if (reader.Name == "p")
                {
                    XmlReader pReader = reader.ReadSubtree();
                    while (pReader.Read())
                    {
                        if (pReader.NodeType == XmlNodeType.Text)
                        {
                            myVar.Append(pReader.Value);
                        }
                    }
                }
            }
        }

        Console.WriteLine(myVar.ToString());
carlosfigueira
  • 85,035
  • 14
  • 131
  • 171
1

I can't upvote or comment on others' responses, so let me just say carlosfigueira hit the nail on the head, that's exactly how you read the text value of an element. his answer helped me immensely.

for the sake of exmeplification here's my code:

while (reader.Read())
{
   switch (reader.NodeType)
   {
       case XmlNodeType.Element:
       {
           if (reader.Name == "CharCode")
           {
               switch (reader.ReadInnerXml())
               {
                   case "EUR":
                   {
                        reader.ReadToNextSibling("Value");
                        label4.Text = reader.ReadInnerXml();
                   }
                   break;
                   case "USD":
                   {
                        reader.ReadToNextSibling("Value");
                        label3.Text = reader.ReadInnerXml();
                   }
                   break;
                   case "RUB":
                   {
                        reader.ReadToNextSibling("Value");
                        label5.Text = reader.ReadInnerXml();
                   }
                   break;
                   case "RON":
                   {
                        reader.ReadToNextSibling("Value");
                        label6.Text = reader.ReadInnerXml();
                   }
                   break;
               }
           }
        }
        break;
    }
}

the file I'm reading can be found here: http://www.bnm.md/md/official_exchange_rates?get_xml=1&date= (you have to add a date in DD.MM.YYYY format to it to get the .XML)

0

I suggest you use HtmlAgilityPack which is a mature and, stable library for doing this sort of thing. It takes care of fetching the html, converting it to xml, and allows you to select the nodes you'd like with XPATH.

In your case it would be as simple as executing

        HtmlDocument doc = new HtmlWeb().Load(myUrl);
        string text = doc.DocumentNode.SelectSingleNode("/data/p").InnerText;
Josh
  • 570
  • 6
  • 18