4

I am trying to get https using Mechanize but failed with:

use strict;
use warnings;

use IO::Socket::SSL;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new;
$mech->proxy(['https','http'], 'http://proxy:8080/');
$mech->get('https://www.google.com');

print $mech->content;

Error:

Error GETing https://www.google.com: Bad Request at perl4.pl line 9.

When I try to use LWP::UserAgent I am able to get https without any error:

use LWP::UserAgent;

my $ua  = LWP::UserAgent->new;
$ua->proxy(['https','http'], 'http://proxy:8080/');
$ua->get('https://www.google.com');

Can anyone help out with this?

Currently using Mechanize 1.72

chrsblck
  • 3,948
  • 2
  • 17
  • 20
user2763829
  • 793
  • 3
  • 10
  • 20
  • Works for me (without proxy). Ask the system admin that is responsible for the proxy to assist you in debugging. – daxim Sep 10 '13 at 09:22
  • I have tried to use the same proxy in the code for User Agent and I could get the https without any bad request error. So I thought there is something wrong with my code not the proxy. Could you explain why does a proxy works with one library but not other? Especially in this case where Mechanize is a subclass of LWP::UserAgent, I guess the get method in Mechanize is the same as UserAgent get method. Sorry for the noob question. This is the first time I am working on proxy and perl. – user2763829 Sep 10 '13 at 13:35
  • Works for me (with proxy). One reason could be that the proxy denies requests if the User-Agent header matches WWW-Mechanize. – Slaven Rezic Sep 10 '13 at 15:58
  • Vaguely remembered something on perlmonks. Either of these look applicable? http://www.perlmonks.org/?node_id=643830 http://www.perlmonks.org/?node_id=896895 – Richard Huxton Sep 10 '13 at 20:36
  • I have tried to debug and got the below message: http://pastebin.com/sNxqfD20 In the WWW::Mechanize::request debug logs it looks like it is getting a http request instead of https. – user2763829 Sep 11 '13 at 02:04
  • I have just found the the LWP::UserAgent code is also returning Bad request error. And have just found a bug report on LWP [link](https://rt.cpan.org/Public/Bug/Display.html?id=1894) Anyone is able to connect to https site via proxy server using User Agent or Mechanize? – user2763829 Oct 29 '13 at 00:45

3 Answers3

4

WWWW::Mechanize is based on LWP::UserAgent, which for years has a strange idea of https proxy requests, e.g. instead of using a CONNECT request to build a tunnel and then upgrade to SSL it sends a GET request with a https-URL. See https://rt.cpan.org/Ticket/Display.html?id=1894

A fix is finally merged into libwwww-perl github repository, but I don't know when a new version of LWP will be released. In the meantime you might use Net::SSLGlue::LWP which monkey patches LWP to provide proper support for https proxy (I'm the author of Net::SSLGlue::LWP and of the fixes to LWP).

Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172
0

I would guess, based on the error you provided that your proxy is blocking a certain User-Agent. The HTTP user agent used by LWP::UserAgent is different from that of WWW::Mechanize.

I suggest to try to use this line:

my $mech = WWW::Mechanize->new( agent => 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36' );

This will make the proxy, and the receiving server to believe you are a Chrome browser, rather than some sort of crawler/malware/virus/etc

Another suggestion is to to do a data dumper on the $mech element and confirm what is "inside":

use Data::Dumper;
print Dumper($mech);

You can also use this same method to dump the content of $mech after the get() function call.

Not sure it is relevant but note, not all Proxies support HTTPS/SSL, only those that allow inline proxying/CONNECT proxying will allow you to proxy HTTPS/SSL traffic.

Noam Rathaus
  • 5,405
  • 2
  • 28
  • 37
0

I have installed LWP-Protocol-connect-6.03 and connect to the proxy with

$https_proxy = 'connect://proxy:8080/';

It is working fine now :D

user2763829
  • 793
  • 3
  • 10
  • 20