0

I have an HTML snippet like this :

<a href="XXXXXXXXXXXXXXX" target="_blank" class="view_job_link">View or apply to job</a>

I want to read href value XXXXXXXXXX using Java.

Point to note: I am reading the HTML file from a URL using inputstreamreader(url.openStream()).

I am getting a complete HTML file, and above snippet is a part of that file.

How can I do this?

Thanks

Karunjay Anand

Michael Myers
  • 188,989
  • 46
  • 291
  • 292
geekIndiana
  • 87
  • 4
  • 14
  • 1
    I notice you tagged your question 'regex'. Please read this. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Amber Aug 14 '10 at 17:52
  • possible duplicate of [Java: I have a big string of html and need to extract the href="..." text...](http://stackoverflow.com/questions/1670593/java-i-have-a-big-string-of-html-and-need-to-extract-the-href-text) – kennytm Aug 14 '10 at 17:53

3 Answers3

3

Use a html parser like Jsoup. The API is easy to learn and for your case,the following code snippet will do

URL url = new URL("http://example.com/");
Document doc = Jsoup.parse(url, 3*1000);
Elements links = doc.select("a[href]"); // a with href
for (Element link : links) {
   System.out.println("Href = "+link.attr("abs:href"));
}
chedine
  • 2,384
  • 3
  • 19
  • 24
1

Use an HTML parser like TagSoup or something similar.

Taylor Leese
  • 51,004
  • 28
  • 112
  • 141
0

You can use Java's own HtmlEditorKit for parsing html. This way you wont need to depend on any third party html parser. Here is an example of how to use it.

Gopi
  • 10,073
  • 4
  • 31
  • 45