0

I am using HTML::TreeBuilder to process HTML files. In those files I can have definition lists where there is term "Database" with definition "Database Name". Simulated html looks like this:

#!/usr/bin/perl -w

use strict;
use warnings;
use HTML::TreeBuilder 5 -weak;
use feature qw( say );

my $exampleContent = '<dl>  
    <dt data-auto="citation_field_label"> 
    <span class="medium-bold">Language:</span> 
    </dt> 
    <dd data-auto="citation_field_value"> 
    <span class="medium-normal">English</span>
    </dd>
    <dt data-auto="citation_field_label"> 
    <span class="medium-bold">Database:</span> 
    </dt> 
    <dd data-auto="citation_field_value"> 
    <span class="medium-normal">Data Archive</span>
    </dd> 
    </dl>';

my $root = HTML::TreeBuilder->new_from_content($exampleContent);

my $dlist = $root->look_down("_tag" => "dl");

foreach my $e ($dlist->look_down("_tag" => 'dt', "data-auto" => "citation_field_label")) {
   if ($e->as_text =~ m/Datab.*/) {
    say $e->as_text; # I have found "Database:" 'dt' field
    # now I need to go to the next field 'dd' and return the value of that
  } 
}

I need to identify which database the file has come from and return the value.

I would like to be able to say something like say $dlist->right()->as_text; when I have identified <dt> with "Database:" in it, but I do not know how. Your thoughts would be much appreciated.

r0berts
  • 842
  • 1
  • 13
  • 27

1 Answers1

1

You were almost there. Using

$e->right->as_text;

Gives me the "Data Archive".

choroba
  • 231,213
  • 25
  • 204
  • 289
  • Great, thanks. :-) Sometimes thinking of TreeBuilder and HTML my head hurts, but it is worth it. So `right` gives the next element at the same level, right? – r0berts Jan 25 '20 at 15:37