I'm using WWW::Mechanize
and HTML::TokeParser
to parse a website for updates. I cannot give any details on the website because it requires a login. The website essentially has a table of data. I'm simply parsing the html till I get to the first row of the table, check if it the value of my last scrape, if not send a mail. This works perfectly well when I test it out on existing table entries, except, when actual updates happen, the scraping doesn't stop at my last scrape. It keeps sending mails until the table is exhausted and repeats this indefinitely. I cannot figure out what is happening. I know there isn't much anyone can verify without the website but I'm posting my code anyways. I'd appreciate ideas on what could be going wrong.
code:
sub func{
my ($comid, $mechlink) = @_;
my $mechanize = WWW::Mechanize->new(
noproxy => 0,
stack_depth => 5,
autocheck => 1
);
$mechanize->proxy( https => undef );
eval{
my $me = $mechanize->get($mechlink);
$me->is_success or die $me->status_line;
};
return $comid if ($@);
my $stream = HTML::TokeParser->new( \$mechanize->{content} ) or die $!;
while ( $tag = $stream->get_tag('td') ) {
if( $tag->[1]{class} eq 'dateStamp' ) {
$dt = $stream->get_trimmed_text('/td');
$tag = $stream->get_tag;
$tag = $stream->get_tag;
$name = $stream->get_trimmed_text('/td') if( $tag->[1]{class} eq 'Name' );
return $comid unless( $tag->[1]{class} eq 'Name' );
$tag = $stream->get_tag;
$tag = $stream->get_tag;
$tag = $stream->get_tag;
$tag = $stream->get_tag;
$info = $stream->get_trimmed_text('/td');
print "$name?\n";
return $retval if($info eq $comid);
print "You've Got Mail! $info $comid\n";
$tcount++;
$retval = $info if($tcount == 1);
$tag = $stream->get_tag;
$tag = $stream->get_tag;
$tag = $stream->get_tag;
$link = "http://www.abc.com".$tag->[1]{href} if ($tag->[0] eq 'a' );
my $outlook = new Mail::Outlook();
my $message = $outlook->create();
$message->To('abc@def.com');
$message->Cc('abc@def.com;abc@def.com');
my $hd = "$name - $info";
$message->Subject($hd);
$message->Body(" ");
$message->Attach($link);
$message->send;
}
}
}