0

I'm parsing a website with WWW::Mechanize to download some images. I need to populate an array with all links related to the resolutions available of the given image. But I need to populate only with links that are equal or less than '1440x900', but I'm not sure where to make that validation. I was trying this:

my @resolutions = map {$_->url} grep {$_->url =~ /$_[0]-\d{4,4}x\d{3,4}/} $mech->find_all_links();

How can I make that validation to only get images equal or less than '1440x900'?

tnx in advance!

Edit:

I can't use find_all_images method of Mech because they end in .html. For instance, the links are like that:

http://www.wallpaper.com/view/some_image-2560x1600.html

just after following this links that you have the .jpg image displayed.

XVirtusX
  • 679
  • 3
  • 11
  • 30
  • Are all image names denoted by their resolution consistently? If yes then I would look through the images method of WWW::Mechanize, which automatically gathers all the image links on the page, then extract the two pieces of information you need with regex and finally push the url to the array if it fits your conditions... (If you will provide a little more info and will want to, I will be able to write up some code too) – cyber-guard Jul 14 '13 at 14:05
  • Please show how target link must look like (html code). – gangabass Jul 14 '13 at 14:05
  • When you installed `WWW::Mechanize`, you also installed `HTML::TreeBuilder` and `HTML::Element` http://metacpan.org/module/HTML::Element With it, you can `look_down()` to find the links and `right()` to find its sibling. – shawnhcorey Jul 14 '13 at 14:59

1 Answers1

2
use 5.014;
for my $link ($mech->find_all_links(url_abs_regex => qr/\d+x\d+\.html$/a)) {
    my ($w, $h) = $link->url =~ /(\d+)x(\d+)/a;
    if ($w <= 1440 && $h <= 900) {
        # do something
    }
}
daxim
  • 39,270
  • 4
  • 65
  • 132
  • just one thing... how can Perl figure out what is the value that should be assigned to $w and to $h in that regex? – XVirtusX Jul 15 '13 at 00:31
  • @XVirtusX: The expression “$link->url =~ /(\d+)x(\d+)/a” returns a list containing the values that were matched by the parenthesised terms. – zgpmax Jul 15 '13 at 05:25