0

What I need is to collect the relevant links from the url. For example from a link like http://beechplane.wordpress.com/ , i need to collect the links that contains the actual articles. ie, links like http://beechplane.wordpress.com/2012/11/07/the-95-confidence-of-nate-silver/ , http://beechplane.wordpress.com/2012/03/06/visualizing-probability-roulette/ etc.

How can I get those links in Java? Is it possible using web-crawlers?

Kara
  • 6,115
  • 16
  • 50
  • 57
Dinoop Nair
  • 2,663
  • 6
  • 31
  • 51

1 Answers1

0

I use jsoup library for that.

How get all <a> tags from document:

Elements a = doc.select("a");
for (Element el : a) {
    //process element
    String href = el.attr("href");
}
zella
  • 4,645
  • 6
  • 35
  • 60