3

I am trying to write a Perl script which will automatically key in search variables on this LexisNexis search page and retrieve the search results.

I am using the WWW::Mechanize module but I am not sure how to figure out the field name of the search bar itself. This is the script I have so far ->

#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
my $m = WWW::Mechanize->new();

my $url = "http://www.lexisnexis.com/hottopics/lnacademic/?verb=sr&csi=379740";
$m->get($url);

$m->form_name('f');
$m->field('q', 'Test');
my $response = $m->submit();
print $response->content();

However, I think the "Name" of the search box in this website is not "q". I am getting the following Error - "Can't call method "value" on an undefined value at site/lib/WWW/Mechanize.pm line 1442." Any help is much appreciated. Thank you !

reinierpost
  • 8,425
  • 1
  • 38
  • 70
Amritha
  • 795
  • 3
  • 9
  • 26
  • 5
    FYI, the [Terms of Service](https://www.lexisnexis.com/terms/general.aspx) for that site say, *"Use of the Online Services via mechanical, programmatic, robotic, scripted or any other automated means is strictly prohibited. Unless otherwise agreed to by LN in writing, use of the Online Services is permitted only via manually conducted, discrete, individual search and retrieval activities."* – ThisSuitIsBlackNot Jan 26 '15 at 18:15
  • 1
    This is for an academic research project and we have acquired the necessary permissions to conduct this automation. Thank you. – Amritha Jan 26 '15 at 18:18
  • 1
    Good to know. Most people who try to scrape websites don't even bother reading the TOS. – ThisSuitIsBlackNot Jan 26 '15 at 18:25
  • I'll bet you a nickel that the search page has JavaScript in it to manipulate the page, and Mech doesn't know from JavaScript. Use the mech-dump utility to see what the forms and fields are on the page, and why they differ from what you expect. – Andy Lester Jan 26 '15 at 18:49
  • Thank you for your comment ! I just realized that the search page has JavaScript. Does any other module support javascprit ? – Amritha Jan 26 '15 at 18:51
  • 1
    @Amritha There are several, such as [`WWW::Mechanize::Firefox`](https://metacpan.org/pod/WWW::Mechanize::Firefox) and [`WWW::Mechanize::PhantomJS`](https://metacpan.org/pod/WWW::Mechanize::PhantomJS). – ThisSuitIsBlackNot Jan 26 '15 at 19:02
  • According to [perlmonks](http://www.perlmonks.org/index.pl?node_id=852699) [WWW::Selenium](http://search.cpan.org/~mattp/Test-WWW-Selenium-1.36/lib/WWW/Selenium.pm) and [WWW::Scripter](http://search.cpan.org/~lxp/WWW-Scripter-0.030/lib/WWW/Scripter.pod) offer Javascript functionality. – Degustaf Jan 26 '15 at 19:06
  • 1
    Be aware that WWW::Selenium uses Selenium RC. To use current Selenium (with WebDriver), use [Selenium::Remote::Driver](https://metacpan.org/pod/Selenium::Remote::Driver). I wrote a script with it last week and it works like a charm. – reinierpost Jan 26 '15 at 22:50
  • 2
    @Amritha: Your problem with JavaScript is the #1 question asked about Mech. You can see the FAQ here for a list of other modules that handle JavaScript. http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod – Andy Lester Jan 26 '15 at 22:52

1 Answers1

5

If you disable the JavaScript in your browser then you will notice that the search form doesn't load which means it's being loaded by JavaScript, that's why you are unable to handle it with WWW::Mechanize. Have a look at WWW::Mechanize::Firefox, this might help you with your task. Check out the example scripts, cookbook and FAQs.

You can also do the same using Selenium, see Gabor's tutorial on Selenium.

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
  • And for using Selenium, [Selenium::Remote::Driver](https://metacpan.org/pod/Selenium::Remote::Driver) is a fine choice, judging by my limited experience. – reinierpost Jan 26 '15 at 22:51
  • Correct, WebDriver is better than RC. – Chankey Pathak Jan 27 '15 at 06:35
  • 1
    I write an article for [PerlTricks](http://www.perltricks.com) on [using WWW::Mechanize::Firefox](http://perltricks.com/article/138/2014/12/8/Controlling-Firefox-from-Perl) – brian d foy Jan 27 '15 at 13:07