0

The code below shows that TreeBuilder method look_down cannot find the "section" element. Why?

use strict;
use warnings;
use HTML::TreeBuilder;

my $html =<<'END_HTML';
<html>
<head><title></title></head>
<body>
<div attrname="div">
<section attrname="section">
</section>
</div>
</body>
</html>
END_HTML

my $tree = HTML::TreeBuilder->new_from_content($html);

my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";

my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";

$tree->delete();

Output: number of div elements found = 1 number of section elements found = 0

Shang Zhang
  • 269
  • 1
  • 8

2 Answers2

3
my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";

This found one element because it matched the attribute attrname with value div that happened to be on <div> tag.

my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";

This matches nothing because there's no tag with an attribute named attrname with value section.

They should be

my @divs = $tree->look_down(_tag => 'div');
...
my @sections = $tree->look_down(_tag => 'section');

This is all somewhat obtusely explained in the HTML::Element#lookdown documentation. There's no clear explanation of what a "criteria" is, and you'd have to read the entire page to find the pseudo-attribute _tag to refer to the tag name... but then carefully reading the entire page would probably save you hours of frustration in the long run :-)

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
  • Thanks Jim. I will read thru the documentation fully. But I also tried find("section") method as well, which also found nothing. So it appears that "section" is not regarded as a valid tag? And
    is not regarded as an element. Sorry for my lack of understanding on HTML basics.
    – Shang Zhang Jul 16 '19 at 23:44
2

This worked for me:

my $tree = HTML::TreeBuilder->new;
$tree->ignore_unknown(0);  # <-- Include unknown elements in tree
$tree->parse($html);
my @divs = $tree->look_down('attrname', 'div');
my @sections = $tree->look_down('attrname', 'section');
print "number of div elements found = ", scalar(@divs), "\n";
print "number of section elements found = ", scalar(@sections), "\n";

Output:

number of div elements found = 1
number of section elements found = 1
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174