HTML::TreeBuilder::XPath missing last tag in result

Question

use WWW::Mechanize;
use HTML::TreeBuilder::XPath;
my $mech = new WWW::Mechanize;
my $tree = new HTML::TreeBuilder::XPath;
my $url = "http://www.elaws.gov.bw/wondersbtree.php";
$mech->get($url);
$tree->parse($mech->content());
@nodes = $tree->findnodes("//p[font = 'PRINCIPAL LEGISLATION']");
print @nodes[0]->as_HTML;

The above code prints out the HTML element searched for, but it is missing the final </p> tag. Why? Is this intentional or is it a bug in the module?

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

By default, the as_HTML method omits certain optional end tags:

as_HTML
$s = $h->as_HTML();
$s = $h->as_HTML($entities);
$s = $h->as_HTML($entities, $indent_char);
$s = $h->as_HTML($entities, $indent_char, \%optional_end_tags);
[ ... ]

If \%optional_end_tags is specified and defined, it should be a reference to a hash that holds a true value for every tag name whose end tag is optional. Defaults to \%HTML::Element::optionalEndTag, which is an alias to %HTML::Tagset::optionalEndTag, which, at time of writing, contains true values for p, li, dt, dd. A useful value to pass is an empty hashref, {}, which means that no end-tags are optional for this dump.

For example:

use strict;
use warnings 'all';
use 5.010;

use HTML::TreeBuilder::XPath;

my $tree = HTML::TreeBuilder::XPath->new_from_content('<p>foo</p>');
my @nodes = $tree->findnodes('//p');

say $nodes[0]->as_HTML(undef, undef, {});

Output:

<p>foo</p>

Note that you should always use strict; and use warnings 'all';.

score 0 · Answer 2 · answered May 31 '16 at 02:12

0

In HTML, the end tag is optional for P elements.

answered May 31 '16 at 02:12

ikegami

367,544
15
269
518

The original HTML source does include the end tag. – CJ7 May 31 '16 at 02:38

HTML::TreeBuilder::XPath missing last tag in result

2 Answers2