How to use Regex to parse Links

Question

I'm working on a project with hyperlinks. I need to parse all links from a string in Java, but only http://rapidshare.com links.

All parsed links shoud be filled in an array. My code looks like this:

Matcher mat = Pattern.compile("(\"(.*?)\"|([^,]+)),?").matcher(html);

But it still get other word brackets and links. How can I get this working?

update on quellcode

Matcher mat = Pattern.compile("/href=\\\"(http://(www\\.)?rapidshare.com/.+)\\\"/").matcher(html);

while (mat.find()) {                        
    result.add(mat.group(2) == null ? mat.group(3) : mat.group(2));                 
}

I'd start off with a regex like: `/href=\"(http://(www\.)?rapidshare.com/.+)\"/`. As @Joeblackdev says, use an online checker to get it correct - and let us know what you come up with! — halfer, Apr 02 '12 at 17:29
Btw, I expect you've been downvoted as people here generally prefer question-askers to give something a good go first. Why not do that now, and add your first try into your question? `:)` — halfer, Apr 02 '12 at 17:36
I tried now with the regex from regex planet . i dont receive any links. i updated my quellcode — user1308342, Apr 02 '12 at 19:09
OK, I made a couple of mistakes before - the online tool shows that you don't need start and end characters (/) and I missed out escaping the dot in `rapidshare.com`. So it would be something like `href=\"(http://(www\.)?rapidshare\.com/.+)\"`. But... don't just copy what I have - debug it using the online tool! — halfer, Apr 02 '12 at 21:53
HI i think this : "#http://rapidshare\\.com/files/(.*?)/([^\\s]+)#" is the correct regex and regexplanet counts two groups but when im trying to match them in a loop it gives me exception — user1308342, Apr 03 '12 at 08:43
Why not check out [regex planet](http://www.regexplanet.com/advanced/java/index.html) where you can test out your pattern? There are also other patterns there which may help you. — Joeblackdev, Apr 02 '12 at 17:13

score 0 · Answer 1 · answered Apr 05 '12 at 12:57

I am using this javascript regexp in my firefox add-on in production:

(?:h..ps?://)?(?:www\.)?rapidshare\.com/files/([0-9]+)/([^\s<"/]{1,500})/?

The popular JDownloader Java open source software is using this:

//    Copyright (C) 2008  JD-Team support@jdownloader.org
"http://[\\w\\.]*?rapidshare\\.com/files/\\d+/?(.*?)($|\\?)"

These two regular expressions are specifically for file links. They require a file name because the API requires a file name.

How to use Regex to parse Links

1 Answers1