14

I'm writing a little Download-Roboter, that is searching for links in lower layers for it self.

What i need to find are all links in an html-Page (the links to .jpg files as well as the links to .pgn, .pdf, .html,.... - files)

I´m using the html-agilitypack to find all a-href links.

Sample code:

foreach (HtmlNode link in htmlDocument.DocumentNode.SelectNodes("//a[@href]"))
{
    HtmlAttribute attribute = link.Attributes["href"];
    links.Add(attribute.Value);
}

But i want to find the data-urls as well.

What XPath-syntax do i have to use to find data-urls. An example data-url in an htmlcode:

    <div class="cbreplay" data-url="2012\edmonton\partien.pgn"></div>

I need the "2012\edmonton\partien.pgn" out of this example. How can i realize this with XPath syntax?

Best greetings, if i made some bad mistakes, tell me. This is my first question ever.

dash
  • 89,546
  • 4
  • 51
  • 71
Joe Black
  • 155
  • 1
  • 1
  • 6

1 Answers1

24

The following should do what you want:

foreach (HtmlNode divNode in htmlDocument.DocumentNode.SelectNodes("//div[@data-url]"))
{
    HtmlAttribute attribute = divNode.Attributes["data-url"];
    links.Add(attribute.Value);
}

Effectively, the statement //div[@data-url] should select all nodes with a data-url attribute. We then pull out this attribute.

If there are nodes other than divs with this attribute, then //*[@data-url] should do the trick.

dash
  • 89,546
  • 4
  • 51
  • 71
  • 2
    it may be more flexible to use `*` instead of `div` - `"//*[@data-url]"`. Those darn html authors keep changing their html! – user3791372 Jan 04 '17 at 17:00