2

I'm wondering how I could extract '4151' from the following code:

</th><td><a class="external exitstitial" rel="nofollow" href="http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151">Look up price</a>

I would like to use regex but if there is a better way I'm open for it!

  • 1
    Assuming that's just a fragment of a complete (X)HTML document, use XPath first to obtain the attribute value, _then_ a regular expression to extract the query parameter. – Alistair A. Israel Aug 11 '11 at 08:58
  • I've already done all of that, I just need the regex to extract it. –  Aug 11 '11 at 09:02

3 Answers3

4

The following works for me, assuming the href attribute value was already extracted:

String href = "http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151";
Pattern p = Pattern.compile("\\?obj=(\\d+)");
Matcher m = p.matcher(href);
if (m.find()) {
    System.out.println(m.group(1));
}

Outputs "4151"

Alistair A. Israel
  • 6,417
  • 1
  • 31
  • 40
3

Here are a few parser libraries : htmlparser, jsoup, and jtidy.

In your case, regex may be fine, but here's a classic post of why you should avoid regex for html parsing.

Community
  • 1
  • 1
asgs
  • 3,928
  • 6
  • 39
  • 54
0

This regex would get you the number -

Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    ResultString = regexMatcher.group();
} 

This code is not tested and presumes your HTML string is assigned to the 'subjectString' variable.

ipr101
  • 24,096
  • 8
  • 59
  • 61