0

I am not sure what is the correct syntax for the url_regex used in WWW::Mechanize.

I am collecting all the links from a web page that start with an http:// and they are of the following format:

http://google.com

and not,

http://google.com/dir/
http://google.com/dir/dir2/

So, I use the following:

@links=$mech->find_all_links(url_regex=>qr/^http:\/\/.*?\//)

And this still captures the URLs with sub paths in them.

I have tested my regex on regexpal.com and it works good. But for some reason, url_regex expects a different syntax.

Thanks.

Neon Flash
  • 3,113
  • 12
  • 58
  • 96

1 Answers1

1

You should use:

@links=$mech->find_all_links(url_regex=>qr/^http:\/\/[^\/]*\/?$/) 

which reads:

String has to start ^ with http:// followed by any combination (even none/empty) of characters others than slash [^\/]* followed by optional slash \/? at the end $.

Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • Thank you. It works. Could you please explain your regex a bit more? My regex was not working since dot would match a forward slash character also, so we need to negate it in the character class. Is that the reason? – Neon Flash Jun 28 '12 at 19:02