I have a huge webpage, which is about 5G size. And I hope I could read the content of the webpage directly(remotely) without downloading the whole file. I have used the Open
File Handler to open the HTTP content. But the error message given is No such files or directory
. I tried to use LWP::Simple
, but it was out of memory if I use get
the whole content. I wonder if there is a way that I could open
this content remotely, and read line by line.
Thank you for your help.

- 1,332
- 2
- 20
- 42
-
Is it static webpage or dynamically generated? – mvp Jan 31 '13 at 06:13
-
It is static, and it is the log file with 5G Size.. The LWP::Simple will simply generated an "Out of Memory".. – Chris Andrews Jan 31 '13 at 06:23
2 Answers
You could try using LWP::UserAgent. The request
method allows you to specify a CODE reference, which would let you process the data as it's coming in.
#!/usr/bin/perl -w
use strict;
use warnings;
use LWP::UserAgent ();
use HTTP::Request ();
my $request = HTTP::Request->new(GET => 'http://www.example.com/');
my $ua = LWP::UserAgent->new();
$ua->request($request, sub {
my ($chunk, $res) = @_;
print $chunk;
return undef;
});
Technically the function should return the content instead of undef, but it seems to work if you return undef. According to the documentation:
The "content" function should return the content when called. The content function will be invoked repeatedly until it return an empty string to signal that there is no more content.
I haven't tried this on a large file, and you would need to write your own code to handle the data coming in as arbitrarily sized chunks.

- 274
- 2
- 3
-
Nice Job!! Thank you. It works in my logs via HTTP. But I still wonder about what is the $res in your code stands for.. Thanks. – Chris Andrews Jan 31 '13 at 06:30
-
Woops, I left that in there accidentally. It is a reference the the `HTTP::Response` object, which might be handy. I'll leave it in my answer for now. – chipschipschips Jan 31 '13 at 06:42
This Perl code will download file from URL with possible continuation if file was already partially downloaded.
This code requires that server returns file size (aka content-length
) on HEAD
request, and also requires that server supports byte ranges on URL in question.
If you want some special processing for next chunk, just override it below:
use strict;
use LWP::UserAgent;
use List::Util qw(min max);
my $url = "http://example.com/huge-file.bin";
my $file = "huge-file.bin";
DownloadUrl($url, $file);
sub DownloadUrl {
my ($url, $file, $chunksize) = @_;
$chunksize ||= 1024*1024;
my $ua = new LWP::UserAgent;
my $res = $ua->head($url);
my $size = $res->headers()->{"content-length"};
die "Cannot get size for $url" unless defined $size;
open FILE, ">>$file" or die "ERROR: $!";
for (;;) {
flush FILE;
my $range1 = -s FILE;
my $range2 = min($range1 + $chunksize, $size);
last if $range1 eq $range2;
$res = $ua->get($url, Range => "bytes=$range1-$range2");
last unless $res->is_success();
# process next chunk:
print FILE $res->content();
}
close FILE;
}

- 111,019
- 13
- 122
- 148