3

I'm following an RSS feed, which returns an XML. Inside the XML are HTML tables, returned as one long string. I'm trying to access the elements of this HTML table with C#, so that I may use each of these elements as variables for another program. An example of a table:

<table cellpadding="5"><tr><td><strong>Date (GMT)</strong></td><td><strong>Event</strong></td><td><strong>Cons.</strong></td><td><strong>Actual</strong></td><td><strong>Previous</strong></td></tr><tr><td>Jun 7 11:00</td><td>Announcement</td><td>6.250 %</td><td>6.310  %</td><td>6.560  %</td></tr></table>

Just about every similar thread on here has suggested HtmlAgilityPack, which I'm trying to use. So far, I've been able to pull out the HTML table and declare it as a string variable, but I can't seem to be able to pull out the table elements. The following is my hack, based on several users' suggestions:

XmlDocument xDoc = new XmlDocument();
xDoc.Load("http://rssfeed.com");
string descr = xDoc.SelectSingleNode("rss/channel/item/description").InnerText;

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("descr");
// A Print statement here (textBox1.Text = descr;) shows that I'm successfully accessing the HTML table
var table = doc.DocumentNode.Descendants("tr")
.Select(n => n.Elements("td").Select(o => o.InnerText).ToArray());

foreach (var tr in table)
{
textBox1.Text = String.Format("{0} {1} {2}", tr[0], tr[1], tr[2]);
}

Any and all suggestions are extremely welcome.

Thanks, D

user1442073
  • 37
  • 1
  • 7
  • 3
    The HTML Agility Pack is best used for HTML from unknown sources and that may not be well structured. Seeing as you have XML and the embedded HTML tables _are_ well formed XML as well, just use the `XmlDocument` as you do (or perhaps `XDocument`, if you can). – Oded Jun 07 '12 at 12:12
  • Oded, thanks for your reply. I actually tried that initially but after several failed attempts, a ton of searches led me to the Agility Pack. That said, if you can point me to an example using just XmlDocument, since I've been unable to find one on my own, I would greatly appreciate it. - Thanks – user1442073 Jun 07 '12 at 13:14

1 Answers1

2

This worked for me, and as long as the Html works as Xml it will for you (and the values are always within a TD). The Value of a TD with a single element inside (aka the strong's) is the same as that element's value.

XElement table = XElement.Parse("<table cellpadding=\"5\"><tr><td><strong>Date (GMT)</strong></td><td><strong>Event</strong></td><td><strong>Cons.</strong></td><td><strong>Actual</strong></td><td><strong>Previous</strong></td></tr><tr><td>Jun 7 11:00</td><td>Announcement</td><td>6.250 %</td><td>6.310  %</td><td>6.560  %</td></tr></table>");
string[] values = table.Descendants("td").Select(td => td.Value).ToArray();

And/or the rows with value arrays:

var rows = table.Elements()
    .Select(tr => tr.Elements().Select(td => td.Value).ToArray())
    .ToList();

Update:

foreach (string value in values)
    Console.WriteLine(value);

foreach (string[] row in rows)
    foreach (string value in row)
        Console.WriteLine(value);
Chuck Savage
  • 11,775
  • 6
  • 49
  • 69
  • 1
    Chuck, this looks extremely promising. Thanks. Any chance I can get you to post the code in its entirety? I'm brand new to XElement and "foreach" doesn't seem to want to work with it so I'm not sure how to print it out. P.S. I tried voting this answer as useful but I'm afraid I lack the requisite rep points. – user1442073 Jun 07 '12 at 14:14