What does this HTML::Parser() code do in Perl?

Question

I came across a Perl code using HTML Parser like below

my $p = HTML::Parser->new(text_h => [ sub {$text .= shift}, 
                                  'dtext']);

Please help me to understand this.

What is your question, exactly?! Do you want a tutorial on the Perl language and its syntax? Do you have questions about the specifics of the `HTML::Parser` module? Or something in-between? — Biffen, Jun 17 '14 at 07:30
Please change the title to your actual question, so people who have a similar question will be able to find yours. — reinierpost, Jun 17 '14 at 14:44

score 2 · Accepted Answer · answered Jun 17 '14 at 06:50

2

From the documentation:

$p = HTML::Parser->new(api_version => 3,
                       text_h => [ sub {...}, "dtext" ]);

This creates a new parser object with a text event handler subroutine that receives the original text with general entities decoded.

Edit:

use HTML::Parser;
use LWP::Simple;
my $html = get "http://perltraining.stonehenge.com";
HTML::Parser->new(text_h => [\my @accum, "text"])->parse($html);
print map $_->[0], @accum;

Another

#!/usr/bin/perl -w
use strict;
use HTML::Parser;
my $text;
my $p = HTML::Parser->new(text_h => [ sub {$text .= shift}, 
                                     'dtext']);
$p->parse_file('test.html');
print $text;

Which, when used on a file like this:

<html>
<head>
<title>Test</title>
</head>
<body>
<h1>Test Stuff</h1>
<p>This is a test</p>
<ul>
<li>this</li>
<li>is a</li>
<li>list</li>
</ul>
</body>
</html>

produces the following output:

Test


Test Stuff
This is a test

this
is a
list

Does that help?

answered Jun 17 '14 at 06:50

Chankey Pathak

21,187
12
85
133

I go through this, but could not understand it. Can you give an example? – RosAng Jun 17 '14 at 06:51
yes it helps a lot, so u mean it removes the tags and fetches only text inside the tags...? – RosAng Jun 17 '14 at 07:00
1

Yes, as the documentation says "text event handler subroutine that receives the original text with general entities decoded." – Chankey Pathak Jun 17 '14 at 07:06
@RosAng: would you consider accepting this answer, please? – halfer Aug 01 '15 at 16:01

What does this HTML::Parser() code do in Perl?

1 Answers1