0
use WWW::Mechanize;
use HTML::TreeBuilder::XPath;
my $mech = new WWW::Mechanize;
my $tree = new HTML::TreeBuilder::XPath;
my $url = "http://www.elaws.gov.bw/wondersbtree.php";
$mech->get($url);
$tree->parse($mech->content());
@nodes = $tree->findnodes("//p[font = 'PRINCIPAL LEGISLATION']");
print @nodes[0]->as_HTML;

The above code prints out the HTML element searched for, but it is missing the final </p> tag. Why? Is this intentional or is it a bug in the module?

ikegami
  • 367,544
  • 15
  • 269
  • 518
CJ7
  • 22,579
  • 65
  • 193
  • 321

2 Answers2

2

By default, the as_HTML method omits certain optional end tags:

as_HTML

$s = $h->as_HTML();
$s = $h->as_HTML($entities);
$s = $h->as_HTML($entities, $indent_char);
$s = $h->as_HTML($entities, $indent_char, \%optional_end_tags);

[ ... ]

If \%optional_end_tags is specified and defined, it should be a reference to a hash that holds a true value for every tag name whose end tag is optional. Defaults to \%HTML::Element::optionalEndTag, which is an alias to %HTML::Tagset::optionalEndTag, which, at time of writing, contains true values for p, li, dt, dd. A useful value to pass is an empty hashref, {}, which means that no end-tags are optional for this dump.

For example:

use strict;
use warnings 'all';
use 5.010;

use HTML::TreeBuilder::XPath;

my $tree = HTML::TreeBuilder::XPath->new_from_content('<p>foo</p>');
my @nodes = $tree->findnodes('//p');

say $nodes[0]->as_HTML(undef, undef, {});

Output:

<p>foo</p>

Note that you should always use strict; and use warnings 'all';.

Community
  • 1
  • 1
ThisSuitIsBlackNot
  • 23,492
  • 9
  • 63
  • 110
0

In HTML, the end tag is optional for P elements.

ikegami
  • 367,544
  • 15
  • 269
  • 518