Read href inside anchor tag using Java

Question

I have an HTML snippet like this :

<a href="XXXXXXXXXXXXXXX" target="_blank" class="view_job_link">View or apply to job</a>

I want to read href value XXXXXXXXXX using Java.

Point to note: I am reading the HTML file from a URL using inputstreamreader(url.openStream()).

I am getting a complete HTML file, and above snippet is a part of that file.

How can I do this?

Thanks

Karunjay Anand

I notice you tagged your question 'regex'. Please read this. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Amber, Aug 14 '10 at 17:52
possible duplicate of [Java: I have a big string of html and need to extract the href="..." text...](http://stackoverflow.com/questions/1670593/java-i-have-a-big-string-of-html-and-need-to-extract-the-href-text) — kennytm, Aug 14 '10 at 17:53

score 3 · Accepted Answer · answered Aug 14 '10 at 18:11

3

Use a html parser like Jsoup. The API is easy to learn and for your case,the following code snippet will do

URL url = new URL("http://example.com/");
Document doc = Jsoup.parse(url, 3*1000);
Elements links = doc.select("a[href]"); // a with href
for (Element link : links) {
   System.out.println("Href = "+link.attr("abs:href"));
}

answered Aug 14 '10 at 18:11

chedine

2,384
3
19
24

You can also use `link.absUrl("href")`. – BalusC Aug 14 '10 at 18:17

score 1 · Answer 2 · answered Aug 14 '10 at 17:52

1

Use an HTML parser like TagSoup or something similar.

answered Aug 14 '10 at 17:52

Taylor Leese

51,004
28
112
141

score 0 · Answer 3 · answered Aug 14 '10 at 18:33

0

You can use Java's own HtmlEditorKit for parsing html. This way you wont need to depend on any third party html parser. Here is an example of how to use it.

answered Aug 14 '10 at 18:33

Gopi

10,073
4
31
45

Read href inside anchor tag using Java

3 Answers3