1

I am trying to use a websites in built search function to collect data from it but can't work out how to press the 'search' button as it has some javascript wrapped around it and the id changes with each new iteration of the page.

Data for the section of the site is as below.

<html>
 <head>
 </head>
 <body>
  <table>
   <tr>
    <td>
    <td>
     <table>
      <tr>
       <td>
        <!-- start of toolbar Main -->
        <table>
         <tr>
          <td>
           <table>
            <tr class="buttonPad">
            </tr>
            <tr>
   *          <td nowrap="true" valign="top" class="button"><a id="S7674" accesskey="S" class="button" title="SEARCH" onclick="dispatch('S7674');"><u>S</u>></td>
            </tr>
           </table>
          </td>
          <td</td>
         </tr>
        </table>
      </td>
      </tr>
     </table>
    </td>
    </td>
   </tr>
  </table>
 </body>
</html>

and my code

   my $tree= HTML::TreeBuilder::XPath->new;
      $tree->parse($url);

   my @nodes = $tree->findnodes('/html/body/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr/td/table.buttonSpace/tbosy/tr/a.button')->get_nodelist; # line is modified later.
   my $nodecount = scalar(@nodes);

   if ($nodecount > 0 ) { print "we found something\n"; }
   else { print "nothing found\n"; } 

   foreach my $node (@nodes)
   {
      print "node is $node\n";
      my $id = $node->findvalue('button');
      print "my id is $id\n";
   }

Sadly my code doesn't return any node values.

Many thanks in advance.

Micro

MicrobicTiger
  • 577
  • 2
  • 5
  • 21

2 Answers2

1

This seems to work:

use strict;
use warnings;
use HTML::TreeBuilder;
use Data::Dumper;

my $html = <<HTML;
<html>
 <head>
 </head>
 <body>
  <table>
   <tr>
    <td>
    <td>
     <table>
      <tr>
       <td>
        <!-- start of toolbar Main -->
        <table>
         <tr>
          <td>
           <table>
            <tr class="buttonPad">
            </tr>
            <tr>
            <td nowrap="true" valign="top" class="button"><a id="S7674" accesskey="S" class="button" title="SEARCH" onclick="dispatch('S7674');"><u>S</u>></td>
            </tr>
           </table>
          </td>
          <td</td>
         </tr>
        </table>
      </td>
      </tr>
     </table>
    </td>
    </td>
   </tr>
  </table>
 </body>
</html>
HTML

my $tree = HTML::TreeBuilder->new_from_content( $html );
foreach my $atag ( $tree->look_down( _tag => q{a}, 'class' => 'button', 'title' => 'SEARCH' ) ) {
    print Dumper $atag->attr('id');
}
user353255
  • 315
  • 2
  • 8
  • Thanks dude, This seems to work nicely on this HTML data but not when I try it on the website itself. (Note, I tided this data a lot for this post) Is there something I should do to the url data before running this? – MicrobicTiger Jan 30 '14 at 21:06
  • and how would I assign the 'id' to a variable to allow me to call it as a button click later? – MicrobicTiger Jan 30 '14 at 21:12
  • Post the unaltered html. You might find something like this much easier to use for your purposes: https://pypi.python.org/pypi/selenium – user353255 Jan 30 '14 at 22:18
  • Would you be able to give me an example of the post? I have very little experience with python, so I suspect it'll be even more painful. – MicrobicTiger Jan 30 '14 at 22:39
  • Example one on that page is a good place to start. `from selenium import webdriver from selenium.webdriver.common.keys import Keys browser = webdriver.Firefox() browser.get('http://www.yahoo.com') assert 'Yahoo!' in browser.title elem = browser.find_element_by_name('p') # Find the search box elem.send_keys('seleniumhq' + Keys.RETURN) browser.quit()` – user353255 Jan 30 '14 at 22:51
  • Ah, I should have been a bit clearer. I'm trying to use perl for this. I don't really want to move over to python. – MicrobicTiger Jan 30 '14 at 23:08
0

You could maybe try a simpler XPath query. You don't need to have the whole hierarchy there, that's overkill. And hard to get right: your HTML doesn't include the tbody that you have in your query (nor the tbosy that you also have ;--).

Try this if the way you identify the element is through the button class and title:

$tree->findnodes( '//td[@class="button"]/a[@class="button" and @title="SEARCH"]')
mirod
  • 15,923
  • 3
  • 45
  • 65