0

I have a subroutine that is called through another script to read the HTML file. Below is the code.

sub read_html
{
    $data=`cat "$_[0]"`;
    use HTML::TableExtract;
    print "CALLING read_html to read $_[0]\n";
    #my $self = shift;
    print "$_[1]";
    $te = HTML::TableExtract->new( headers => [($_[1])] );
    $te->parse($data);
    my $line_cnt=0;
    # Examine all matching tables
    foreach $ts ($te->tables)
    {
        if ($ts->rows ne "")
        {
            foreach $row ($ts->rows)
            {
                foreach (@$row) { $_='' unless defined $_; }
                print @$row;
                if (@$row[0] ne ' '  and @$row[0] ne ''  and
                    @$row[0] ne "\n" and @$row[0] ne "\t")
                {
                    $line_cnt++;
                }
            }
        }
        return $line_cnt;
    }
}

When I run the above script, it doesn’t show me the HTML table data when the header is passed as the variable.

$te = HTML::TableExtract->new( headers => [($_[1])] );

However if I replace the expression $_[1] with the hard coded values like below, it returns all the column values under the specified headers

$te = HTML::TableExtract->new(
    headers => [("PO Number",
                 "Invoice Number",
                 "DC Number",
                 "Store Number",
                 "Invoice Amount",
                 "Discount",
                 "Amount Paid")] );

I am calling the subroutine as read_html($file, $headers) where $file is a file name and $headers holds the header values, comma separated.

Any help would be greatly appreciated.

Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
Vicks
  • 13
  • 4

1 Answers1

1

I am calling the subroutine as read_html($file, $headers) where $file is a file name and $headers as the header values comma separated.

The headers parameter of HTML::TableExtract->new expects a reference to an array of strings, where each string is a separate header. It sounds like you are instead passing it a reference to an array containing a single string containing comma characters.

my @headers = split m(\s*,\s*), $_[1];
$te = HTML::TableExtract->new( headers => \@headers );

If this is not correct, then your question needs to be more specific with regards to how you are calling read_html.

Oktalist
  • 14,336
  • 3
  • 43
  • 63
  • thanks, you are correct I was calling it with a single string containing comma characters. – Vicks Jun 12 '13 at 04:52
  • thanks, you are correct I was calling it with a single string containing comma characters. however I tried calling it after converting it to array as you suggested, but the results is same. It is not returning any thing when passed as an array reference. – Vicks Jun 12 '13 at 05:12
  • I did figure out the problem, while calling the sub routine I am passing the values in double quotes "", so while I was converting the same to array it was going with these quotes which was not matched with the headers of the file. I removed the "" and it worked perfectly. Thanks – Vicks Jun 12 '13 at 09:31
  • @YordanGeorgiev sure, start a bounty and then award it after the 24h grace period LOL ;) glad to have helped – Oktalist Jun 21 '13 at 23:07