2

I'm using the code below to try to search Google Scholar from my website and it will work once or twice and then I get the error "Error GETing http://scholar.google.com: Can't connect to scholar.google.com:80 (Permission denied)" - the code I'm using is as follows:

use strict;
use WWW::Mechanize;
my $browser = WWW::Mechanize->new();
$browser->get('http://scholar.google.com');
$browser->form_name('f');
$browser->field('q','PCR');
$browser->submit();
print $browser->content();

Any tips or advice is greatly appreciated

neemie
  • 83
  • 1
  • 7

1 Answers1

1

Your code is just fine, but Google Scholar decided not to allow access by "bots" like LWP, see perlmonks/461130 for more information.

Edit: I found a solution by passing user-agent and a cookie id in the header:

use HTTP::Request;
use HTTP::Cookies;
use LWP::UserAgent;

# randomize cookie id
use Digest::MD5 qw(md5_hex);
my $googleid = md5_hex(rand());

# escape query string
use URI::Escape;
my $query= uri_escape('search string');

# create request
my $request = HTTP::Request->new(GET => 'http://scholar.google.com/scholar?q='.$query);

# disguise as Mozilla
my $ua = LWP::UserAgent->new;
$ua->agent('Mozilla/5.0');

# use random id for Cookie
my $cookies = HTTP::Cookies->new();
$cookies->set_cookie(0,'GSP', 'ID='.$googleid,'/','scholar.google.com');
$ua->cookie_jar($cookies);

# submit request
$response = $ua->request($request);
if($response->is_success){
    print $response->code;
    my $text = $response->decoded_content;
    # do something
}
Martin
  • 1,395
  • 1
  • 11
  • 33