0

I have a list of 2500 websites and need to grab a thumbnail screenshot of them. How do I do that? I could try to parse the sites either with Perl Mechanize - eg like this.

  use WWW::Mechanize::Firefox;
  my $mech = WWW::Mechanize::Firefox->new();
  $mech->get('http://google.com');

  my $png = $mech->content_as_png();

How do I do this for all of the different URLs? How do I read them from a file? In other words I store the URLs in a file. And afterwards I put out the results in another directory.

From the docs:

Returns the given tab or the current page rendered as PNG image. All parameters are optional. $tab defaults to the current tab. If the coordinates are given, that rectangle will be cut out. The coordinates should be a hash with the four usual entries, left,top,width,height.This is specific to WWW::Mechanize::Firefox.

CanSpice
  • 34,814
  • 10
  • 72
  • 86
zero
  • 1,003
  • 3
  • 20
  • 42
  • 2
    [Original answer](http://stackoverflow.com/a/8381303) provided by SO user [gangabass](http://stackoverflow.com/u/347767). – daxim Dec 06 '11 at 22:50

2 Answers2

2

I think I understand... you want to have a list of 2,500 URLs, one on each line, saved in a file. Then you want your script above to open the file, read a line, then retrieve the website? If so, something like this:

    Filename: urls.txt
    ------------------
    www.google.com
    www.cnn.com
    www.msnbc.com
    news.bbc.co.uk
    www.bing.com
    www.yahoo.com

Then the code:

    use WWW::Mechanize::Firefox;
    my $mech = WWW::Mechanize::Firefox->new();

    open(INPUT, "urls.txt") or die "Can't open file: $!";

    while (<INPUT>) {
      chomp;
      $mech->get($_);
      my $png = $mech->content_as_png();
    }
    close(INPUT);
    exit;
AWT
  • 3,657
  • 5
  • 32
  • 60
  • 2
    Can the Mechanize `->new()` operation be hoisted out of the `while()` loop? Would that improve the performance of the script? – sarnold Dec 06 '11 at 22:51
  • 1
    Indeed it would, good catch. It only needs to be instantiated once. I'll update my answer. – AWT Dec 06 '11 at 22:58
  • hi torgis hello samold - many many thanks for the great reply. i test this. Sure with good results! – zero Dec 07 '11 at 00:29
1

Assuming your list is in a file named list.txt:

open( my $fh, '<', 'list.txt') or die "Could not open list.txt: $!";
foreach my $url ( <$fh> ) {
    chomp $url;
    # Do your mechanize thing here using $url
}
close $fh;

Basically, open the file then loop through all of the lines in the file.

CanSpice
  • 34,814
  • 10
  • 72
  • 86
  • hi there - many many thanks for the great answer!! Overwhelming! i thank you. greetings. – zero Dec 07 '11 at 00:23