Regex in WWW::Mechanize in Perl

Question

I am not sure what is the correct syntax for the url_regex used in WWW::Mechanize.

I am collecting all the links from a web page that start with an http:// and they are of the following format:

http://google.com

and not,

http://google.com/dir/
http://google.com/dir/dir2/

So, I use the following:

@links=$mech->find_all_links(url_regex=>qr/^http:\/\/.*?\//)

And this still captures the URLs with sub paths in them.

I have tested my regex on regexpal.com and it works good. But for some reason, url_regex expects a different syntax.

Thanks.

Ωmega · Accepted Answer · 2012-06-28T19:12:14.360

1

You should use:

@links=$mech->find_all_links(url_regex=>qr/^http:\/\/[^\/]*\/?$/)

which reads:

String has to start ^ with http:// followed by any combination (even none/empty) of characters others than slash [^\/]* followed by optional slash \/? at the end $.

edited Jun 28 '12 at 19:12

answered Jun 28 '12 at 17:41

Ωmega

42,614
34
134
203

Thank you. It works. Could you please explain your regex a bit more? My regex was not working since dot would match a forward slash character also, so we need to negate it in the character class. Is that the reason? – Neon Flash Jun 28 '12 at 19:02

Regex in WWW::Mechanize in Perl

1 Answers1