Saving partial content when WWW::Mechanize GET times out

Question

I'm using the following Perl code to get data from https://www.otcmarkets.com/research/stock-screener/api?sortField=symbol&sortOrder=asc&page=0&pageSize=20000:

use warnings;
use WWW::Mechanize::GZip;

my $TempFilename = "D:\\temp\\test.txt";

my $mech = WWW::Mechanize::GZip->new(
    ssl_opts => {
        verify_hostname => 0,
    },
);

$mech->get("https://www.otcmarkets.com/research/stock-screener/api?sortField=symbol&sortOrder=asc&page=0&pageSize=20000");
open(OUT, ">", $TempFilename);
binmode(OUT, ":utf8");
print OUT $mech->content;
close(OUT);

Unfortunately the request always times out, and my temporary file always contains

read timeout at C:/Strawberry/perl/vendor/lib/Net/HTTP/Methods.pm line 268.

However, if I point a web browser to the same URL, I get a bunch of JSON data that looks like this, which is what I am seeking:

"{\"count\":17114,\"pages\":1,\"stocks\":[{\"securityId\":194057,\"reportDate\":\"Jan 26, 2022 12:00:00 AM\",\"symbol\":\"AAAIF\",\"securityName\":\"ALTERNATIVE INVESTMENT TR\",\"market\":\"Pink ...

My question is whether there is any way I can modify my script so that it saves the same data that my web browser is able to display instead of the timeout message to my file.

Thanks

You can probably use `:content_cb`. See the LWP::UserAgent docs (of which WWW::Mechanize is a subclass). — ikegami, Jan 27 '22 at 22:12
Is there any valid reason you chosen [WWW::Mechanize::GZip](https://metacpan.org/pod/WWW::Mechanize::GZip) as a method to capture generated JSON? — Polar Bear, Jan 28 '22 at 01:11

Miguel Prz · Answer 1 · 2022-01-28T07:24:55.947

Change the user agent, the default is a string of the form libwww-perl/#.###. But some sites are sensible to that. Also, you can use WWW::Mechanize directly and set a concrete timeout parameter (in seconds). Like this:

use strict;
use warnings;
use WWW::Mechanize;

my $TempFilename = "c:\\temp\\test.txt";
my $url = "https://www.otcmarkets.com/research/stock-screener/api?sortField=symbol&sortOrder=asc&page=0&pageSize=20000";

my $mech = WWW::Mechanize->new(
    agent    => "Mozilla/5.0",
    timeout  => 15,
    # ssl_opts => { verify_hostname => 0 },
);

$mech->get($url);
open my $f_out, ">", $TempFilename or die "Cannot open file";
binmode $f_out, ":utf8";
print $f_out $mech->content;
close $f_out;

Adding parameter `agent => "Mozilla/5.0"` does the trick. – Polar Bear Jan 28 '22 at 07:25 — Polar Bear, Jan 28 '22 at 07:25

Saving partial content when WWW::Mechanize GET times out

1 Answers1