4

I'm currently learning C# and its fun so far, but I have hit a roadblock.

I have a program that can scrape a webpage inside the web browser control for information.

So far I can get HTML

HtmlWindow window = webBrowser1.Document.Window;
string str = window.Document.Body.OuterHtml;
richTextBox1.Text = (str.ToString());   

And Text

HtmlWindow window = webBrowser1.Document.Window;
string str = window.Document.Body.OuterText;
richTextBox1.Text = (str.ToString());

I have tried to scrape and display links like this

HtmlWindow window = webBrowser1.Document.Window;
string str = window.Document.Body.GetElementsByTagName("A").ToString();
richTextBox1.Text = str;

But instead, the Rich text box on the form populates with this

System.Windows.Forms.HtmlElementCollection

Do you know how I can get a list of links from the current webpage to show in the textbox?

Thanks Chris.

CAbbott
  • 8,078
  • 4
  • 31
  • 38
Gates
  • 43
  • 4

1 Answers1

3

With the HtmlAgility pack it's easy:

HtmlWindow window = webBrowser1.Document.Window;
string str = window.Document.Body.OuterHtml;

HtmlAgilityPack.HtmlDocument HtmlDoc = new HtmlAgilityPack.HtmlDocument();
HtmlDoc.LoadHtml(str);

HtmlAgilityPack.HtmlNodeCollection Nodes = HtmlDoc.DocumentNode.SelectNodes("//a");

foreach (HtmlAgilityPack.HtmlNode Node in Nodes)
{
    textBox1.Text += Node.OuterHtml + "\r\n";
}
CAbbott
  • 8,078
  • 4
  • 31
  • 38
Steve Wellens
  • 20,506
  • 2
  • 28
  • 69