0

I am trying to download a file from a web page.

First I get the links with the linkextractor and then I want to download them with the lwp I'm a newbie programming in perl.

I made the following code ...

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TableExtract;
use HTML::LinkExtractor;
use LWP::Simple qw(get);
use Archive::Zip;

my $html = get $ARGV[0];

my $te = HTML::TableExtract->new(
    keep_html => 1,
    headers => [qw( column1 column2 )],
);
$te->parse($html);

# I get only the first row
my ($row) = $te->rows;

my $LXM = new HTML::LinkExtractor(undef,undef,1);
$LXM->parse(\$$row[0]);
my ($t) = $LXM->links;

my $LXS = new HTML::LinkExtractor(undef,undef,1);
$LXS->parse(\$$row[1]);
my ($s) = $LXS->links;

#-------
for (my $i=0; $i < scalar(@$s); $i++) {
  print "$$s[$i]{_TEXT} $$s[$i]{href} $$t[$i]{href} \n";
  my $file = '/tmp/$$s[$i]{_TEXT}';
  my $url = $$s[$i]{href};
  my $content = getstore($url, $file);
  die "Couldn't get it!" unless defined $content;
}

And I get the following error

Undefined subroutine &main::getstore called at ./geturlfromtable.pl line 35.

Thanks in advance!

nixv
  • 1
  • 1

1 Answers1

2

LWP::Simple can be loaded in two different ways.

use LWP::Simple;

This loads the module and makes all of its functions available to your program.

use LWP::Simple qw(list of function names);

This loads the module and only makes available the specific set of functions you have requested.

You have this code:

use LWP::Simple qw(get);

This makes the get() function available, but not the getstore() function.

To fix this, either add getstore() to your list of functions.

use LWP::Simple qw(get getstore);

Or (probably simpler) remove the list of functions.

use LWP::Simple;

Update: I hope you don't mind if I add a couple of style points.

Firstly, you're using a really old module - HTML::LinkExtractor. It hasn't been updated for almost fifteen years. I'd recommend looking at HTML::LinkExtor instead.

Secondly, your code uses a lot of references, but you're using them in a really over-complicated way. For example, where you have \$$row[0], you really only need $row->[0]. Similarly, $$s[$i]{href} will be easy for most people to understand if written as $s->[$i]{href}.

Next, you use the C-style for loop and iterate over the array's indexes. It's usually simpler to use foreach to iterate from zero to the last index in the array.

foreach my $i (0 .. $#$s) {
  print "$s->[$i]{_TEXT} $s->[$i]{href} $t->[$i]{href} \n";
  my $file = "/tmp/$s->[$i]{_TEXT}";
  my $url = $s->[$i]{href};
  my $content = getstore($url, $file);
  die "Couldn't get it!" unless defined $content;
}

And finally, you seem slightly confused about what getstore() returns. It returns the HTTP response code. So it will never be undefined. If there's a problem retrieving the content, you'll get 500 or 403 or something like that.

Dave Cross
  • 68,119
  • 3
  • 51
  • 97
  • Thanks Dave! Now I have problems with the variables ... I want the file to be saved as the _TEXT of the url `my $file = '/tmp/$$s[$i]{_TEXT}';` surely it is badly declared – nixv Nov 22 '19 at 13:43
  • @nixv: Well, yes. At the very least, you're going to want to put that string in double-quotes if you want the variables to be expanded. – Dave Cross Nov 22 '19 at 14:44
  • Thanks! and for your style points too!!! Now, I have problems decompressing downloaded files :( – nixv Nov 25 '19 at 14:15