-2

The input html is attached (my $file), With the following script, I cannot extract the table I want. Any suggestions?

use strict;
use warnings;
use HTML::TableExtract;

my $file="view-source_www.nasdaq.com_dividend-stocks_dividend-calendar.aspx_date=2017-Apr-19.html";
open DATA,$file || die "cannot";

my $content;
{
    local $/ = undef; # slurp mode
    $content = <DATA>;
}
close DATA;

my $te;
$te = HTML::TableExtract->new( headers => [qw(Announcement_Date)] );
$te-> parse($content);

# Examine all matching tables
foreach my $ts ($te->tables) {
  print "Table (", join(',', $ts->coords), "):\n";
  foreach my $row ($ts->rows) {
     print join(',', @$row), "\n";
  }
}
Progman
  • 16,827
  • 6
  • 33
  • 48
Shicheng Guo
  • 1,233
  • 16
  • 19
  • 2
    Your file is actually an HTML page which contains the (escaped) source of another page. You probably want the original HTML, not the encoded version of it. – jcaron Apr 17 '17 at 10:08
  • [HTML::TableExtract is beautiful](https://www.nu42.com/2012/04/htmltableextract-is-beautiful.html) ... "does not work" does not help. – Sinan Ünür Apr 17 '17 at 14:32

1 Answers1

2

Two problems here.

Firstly, as jcaron points out in a comment, you're not parsing the right thing. You seem to be parsing a "view source" page. You need to get the HTML directly. You can do that with LWP::Simple.

use LWP::Simple;

my $url = 'http://www.nasdaq.com/dividend-stocks/dividend-calendar.aspx?date=2017-Apr-19';

my $content = get $url;

Running your code now gives no errors but, unfortunately, it gives no output either. That's because you're defining the headers argument to the object constructor incorrectly. You use qw(Announcement_Date) but there is no table header with the value "Announcement_Date", so no matching table is found.

If you change the constructor call to this:

$te = HTML::TableExtract->new( headers => ['Announcement Date'] );

Then you get the expected output.

Community
  • 1
  • 1
Dave Cross
  • 68,119
  • 3
  • 51
  • 97