0

I am having difficulty recovering data from within an HTML table. Here is what I have.

use strict; 
use warnings;
use HTML::TreeBuilder;
use HTML::TableExtract qw(tree); #
use WWW::Mechanize;

my $d = 3; 
my $c = 4; 

$te = HTML::TableExtract->new( depth => $d, count => $c ); # , decode => 1, gridmap => 1
$te->parse($mech->content);
print "\nDepth = $d, Count = $c \n\n";
my $table = $te->first_table_found;
my $table_tree = $table->tree();
my @rows = $table->rows();
print "The row count is   : ".$rowcount,"\n";
print "The column count is: ".$colcount,"\n";
foreach my $row (@rows)
{
   my @read_row = $table->tree->row($row);
   foreach my $read (@read_row)
   {
      print $read, "\n";
   }
}

I get this as the error message.

"Rows(ARRAY(0x2987ef8)) out of range at test4.pl line 91."

Is there a better way of looking through the table and getting the values. I have no headers to look for and I have looked at HTML::Query but couldn't find it or the required Badger::Base through PPM and HTML::Element looks like it's better used for table construction. I'm also using WWW::Mechanize earlier in the script.

Any help on my code above would be appreciated.

MicrobicTiger
  • 577
  • 2
  • 5
  • 21

1 Answers1

1

You don't really need tree extraction mode for most purposes.

Please always use strict and use warnings at the top of every Perl program you write, and declare your variables as close as possible to their first point of use.

Your call $table->rows() returns a list of array references, that you can access like this

my $te = HTML::TableExtract->new(depth => $d, count => $c); # , decode => 1, gridmap => 1
$te->parse($mech->content);
printf "\nDepth = %d, Count = %d\n\n", $d, $c;

my $table = $te->first_table_found;
my @rows = $table->rows;

for my $row (@rows) {
  print join(', ', @$row), "\n";
}
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Thanks Borodin. In my attempt to declutter the code I forgot the modules as well as use strict and use warnings. They are there in my actual code :). I get a data element hash reference like this one for each cell with this code though HTML::ElementTable::DataElement=HASH(0x3b91ff8). Any idea how to get around that. – MicrobicTiger Apr 01 '14 at 02:25
  • @MicrobicTiger: have you removed the `tree` from `HTML::TableExtract`? As I said, you don't need it here and it just makes the code more complicated. You also appear to be missing a declaration for `$te`, or at least it isn't where it should be. You should put it where it is first used as I have in my own code. – Borodin Apr 01 '14 at 03:09
  • Ah, I see the problem. I have some uninitialized values but that works much better. Thanks @Borodin, I have appreciated your help these last few days. – MicrobicTiger Apr 01 '14 at 03:23