1

Why does WWW::Mechanize have blank content after getting the following URL? Using a browser or curl there is a full HTML page retrieved.

use WWW::Mechanize;
$mech = new WWW::Mechanize;
$mech->get("http://www.belizejudiciary.org/web/judgements2/");
print $mech->content  # prints nothing

Here is the dump of the response:

HTTP/1.1 200 OK
Connection: close
Date: Fri, 10 Feb 2017 00:51:47 GMT
Server: Apache/2.4
Content-Type: text/html; charset=UTF-8
Client-Aborted: die
Client-Date: Fri, 10 Feb 2017 00:51:48 GMT
Client-Peer: 98.129.229.64:80
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
Link: <http://www.belizejudiciary.org/web/wp-json/>; rel="https://api.w.org/"
Link: <http://www.belizejudiciary.org/web/?p=468>; rel=shortlink
Set-Cookie: X-Mapping-hepadkon=FAB86566672CEB74D66B2818CA030616; path=/
X-Died: Illegal field name 'X-Meta-Twitter:title' at /usr/local/lib/perl5/site_perl/5.16.3/sun4-solaris/HTML/HeadParser.pm line 207.
X-Pingback: http://www.belizejudiciary.org/web/xmlrpc.php

I have version 3.70 of HTML::Parser installed.

ThisSuitIsBlackNot
  • 23,492
  • 9
  • 63
  • 110
CJ7
  • 22,579
  • 65
  • 193
  • 321
  • I'm not getting nothing. What version of the module do you use? What is the contents of `use Data::Dumper; print Dumper($mech->response)`? – choroba Feb 10 '17 at 00:48
  • @choroba My system has version 3.70 of `HTML:Parser`, so that might be a problem. See my edit for the response. – CJ7 Feb 10 '17 at 00:52
  • @choroba adding `$mech->parse_head(0)` before the `get` solved the problem. From this answer: http://stackoverflow.com/a/17745491/327528 – CJ7 Feb 10 '17 at 01:00
  • I feel like this is a duplicate of [that other SO question](http://stackoverflow.com/q/14740365) you found, but those answers both leave something to be desired. I mean, monkey-patching HTML::HeadParser? Disabling header parsing altogether? Really? – ThisSuitIsBlackNot Feb 10 '17 at 01:27
  • @ThisSuitIsBlackNot In my opinion that is the problem in marking questions as duplicates. – CJ7 Feb 10 '17 at 02:27
  • It might be better to mark that question as a dup of this one. Your question is better since it includes the actual error. – ThisSuitIsBlackNot Feb 10 '17 at 02:32

1 Answers1

2

Your dump shows that there was an error parsing the response:

X-Died: Illegal field name 'X-Meta-Twitter:title' at /usr/local/lib/perl5/site_perl/5.16.3/sun4-solaris/HTML/HeadParser.pm line 207.

This is caused by a bug in HTML::HeadParser:

<meta> tags can have name attributes with colons in them, and this is perfectly valid. But HTML::HeadParser then tries to register these as X-Meta-<name> headers using HTTP::Headers. Newer versions of HTTP::Headers (since 6.05) have stricter checks for headers, and will refuse them if they contain colons.

This was fixed in version 3.71 of the HTML-Parser distribution, so you should upgrade.

ThisSuitIsBlackNot
  • 23,492
  • 9
  • 63
  • 110