I am trying to parse the following data from an HTML document using HTMLAgillityPack:
<a href="http://abilene.craigslist.org/">abilene</a> <br>
<a href="http://albany.craigslist.org/"><b>albany</b></a> <br>
<a href="http://amarillo.craigslist.org/">amarillo</a> <br>
...
I would like parse out the URL and the name of the city into 2 separate files.
Example:
urls.txt
"http://abilene.craigslist.org/"
"http://albany.craigslist.org/"
"http://amarillo.craigslist.org/"
cities.txt
abilene
albany
amarillo
Here is what I have so far:
public void ParseHtml()
{
//Clear text box
textBox1.Clear();
//managed wrapper around the HTML Document Object Model (DOM).
HtmlAgilityPack.HtmlDocument hDoc = new HtmlAgilityPack.HtmlDocument();
//Load file
hDoc.Load(@"c:\AllCities.html");
try
{
//Execute the input XPath query from text box
foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes(xpathText.Text))
{
textBox1.Text += hNode.InnerHtml + "\r\n";
}
}
catch (NullReferenceException nre)
{
textBox1.Text += "Can't process XPath query, modify it and try again.";
}
}
Any help would be greatly appreciated! Thanks guys!