-4

I'm working on a project with hyperlinks. I need to parse all links from a string in Java, but only http://rapidshare.com links.

All parsed links shoud be filled in an array. My code looks like this:

Matcher mat = Pattern.compile("(\"(.*?)\"|([^,]+)),?").matcher(html);

But it still get other word brackets and links. How can I get this working?

update on quellcode

Matcher mat = Pattern.compile("/href=\\\"(http://(www\\.)?rapidshare.com/.+)\\\"/").matcher(html);

while (mat.find()) {                        
    result.add(mat.group(2) == null ? mat.group(3) : mat.group(2));                 
}
halfer
  • 19,824
  • 17
  • 99
  • 186
  • I'd start off with a regex like: `/href=\"(http://(www\.)?rapidshare.com/.+)\"/`. As @Joeblackdev says, use an online checker to get it correct - and let us know what you come up with! – halfer Apr 02 '12 at 17:29
  • Btw, I expect you've been downvoted as people here generally prefer question-askers to give something a good go first. Why not do that now, and add your first try into your question? `:)` – halfer Apr 02 '12 at 17:36
  • I tried now with the regex from regex planet . i dont receive any links. i updated my quellcode – user1308342 Apr 02 '12 at 19:09
  • OK, I made a couple of mistakes before - the online tool shows that you don't need start and end characters (/) and I missed out escaping the dot in `rapidshare.com`. So it would be something like `href=\"(http://(www\.)?rapidshare\.com/.+)\"`. But... don't just copy what I have - debug it using the online tool! – halfer Apr 02 '12 at 21:53
  • HI i think this : "#http://rapidshare\\.com/files/(.*?)/([^\\s]+)#" is the correct regex and regexplanet counts two groups but when im trying to match them in a loop it gives me exception – user1308342 Apr 03 '12 at 08:43
  • Why not check out [regex planet](http://www.regexplanet.com/advanced/java/index.html) where you can test out your pattern? There are also other patterns there which may help you. – Joeblackdev Apr 02 '12 at 17:13

1 Answers1

0

I am using this javascript regexp in my firefox add-on in production:

(?:h..ps?://)?(?:www\.)?rapidshare\.com/files/([0-9]+)/([^\s<"/]{1,500})/?

The popular JDownloader Java open source software is using this:

//    Copyright (C) 2008  JD-Team support@jdownloader.org
"http://[\\w\\.]*?rapidshare\\.com/files/\\d+/?(.*?)($|\\?)"

These two regular expressions are specifically for file links. They require a file name because the API requires a file name.