0

Basically, I need to get the names and emails from all of these people in the HTML code.

<thead>
        <tr>
            <th scope="col" class="rgHeader" style="text-align:center;">Name</th><th scope="col" class="rgHeader" style="text-align:center;">Email Address</th><th scope="col" class="rgHeader" style="text-align:center;">School Phone</th>
        </tr>
    </thead><tbody>
    <tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__0">
        <td>
                            Michael Bowen
                        </td><td>mbowen@cpcisd.net</td><td>903-488-3671 ext3200</td>
    </tr><tr class="rgAltRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__1">
        <td>
                            Christian Calixto
                        </td><td>calixtoc@cpcisd.net</td><td>903-488-3671 x 3430</td>
    </tr><tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__2">
        <td>
                            Rachel Claxton
                        </td><td>claxtonr@cpcisd.net</td><td>903-488-3671 x 3450</td>
    </tr>
    </tbody>

</table><input id="ctl00_ContentPlaceHolder1_rg_People_ClientState" name="ctl00_ContentPlaceHolder1_rg_People_ClientState" type="hidden" autocomplete="off">    </div>


        <br>

I know how to use treebuilder with the nodes and such, and I'm using this code in some of my script.

    my ($file) = @_;
my $html = path($file)-> slurp;
my $tree = HTML::TreeBuilder->new_from_content($html);
my @nodes = $tree->look_down(_tag => 'input');
my $val;
foreach my $node (@nodes) {
    $val = $node->look_down('name', qr/\$txt_Website/)->attr('value');
}
return $val;

I was going to use the same code for this function, but I realized that I don't have much to search for, since the <td> tag is in so many other places in the script. I'm sure there's a better way to approach this problem, but I can't seem to find it.

LINK TO HTML CODE: http://pastebin.com/qLwu80ZW

MY CODE: https://pastebin.com/wGb0eXmM

Note: I did look on google as much as possible, but I'm not quite sure what I should search for.

1 Answers1

4

The table element that encloses the data you need has a unique class rgMasterTable so you can search for that in look_down

I've written this to demonstrate. It pulls the HTML directly from your pastebin

use strict;
use warnings 'all';

use LWP::Simple 'get';
use HTML::TreeBuilder;

use constant URL => 'http://pastebin.com/raw/qLwu80ZW';

my $tree = HTML::TreeBuilder->new_from_content(get URL);

my ($table) = $tree->look_down(_tag => 'table', class => 'rgMasterTable');

for my $tr ( $table->look_down(_tag => 'tr') ) {

    next unless my @td = $tr->look_down(_tag => 'td');

    my ($name, $email) = map { $_->as_trimmed_text } @td[0,1];

    printf  "%-17s %s\n", $name, $email;
}

output

Michael Bowen     mbowen@cpcisd.net
Christian Calixto calixtoc@cpcisd.net
Rachel Claxton    claxtonr@cpcisd.net
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • 1
    Absolutely stupendous! Not all heroes wear capes ;D – Ultracrepidarian Mar 24 '17 at 20:27
  • I've just uncovered an error. When I try to run the script, I get an error on your snippet of code. I feel like it could be something to do with what @zdim said above: " Just noticed that one of the three is – Ultracrepidarian Mar 24 '17 at 20:54
  • 2
    @Ultracrepidarian: My solution works with the data you have published. The class of the `tr` elements is irrelevant. I can't debug your code without seeing it, and it needs to be another question if you can't fix it yourself. Basically you're calling `look_down` on something without checking whether the previous search succeeded; the only place my code does that is looking for the initial `table` element. – Borodin Mar 24 '17 at 21:13
  • @Ultracrepidarian: What have you changed from my code? – Borodin Mar 24 '17 at 21:25
  • @Ultracrepidarian As for my comment you quote, Borodin is using `rgMasterTable` (class) from the `` itself, and then searches for the `` and then `
    ` elements (tags). So this code doesn't depend on `rgRow|rgAltRow` (or whatever the class name of `td` may be). You must have changed something in this code so look carefully through all of it.
    – zdim Mar 25 '17 at 05:39
  • Thanks for the clarification, zdim. @Borodin I've changed a few things. I've updated my thread to have a link to my full code. The part I'm having trouble with is line 169 to 179. – Ultracrepidarian Mar 27 '17 at 19:33
  • I've fixed that error by just extracting the file directly from the pastebin, as @Borodin did. But now I'm getting another error saying `Error open (<) on '1.html': No such file or directory at spreadsheet.pl line 18.` I have no idea why it might be saying this, or even where to start. – Ultracrepidarian Mar 27 '17 at 20:19
  • @Ultracrepidarian: You're raising new issues now, and you need to open a new question. – Borodin Mar 27 '17 at 22:01