0

I want to extract some data from a large-ish (3+ GB, gzipped) FTP download, and do this on-the-fly, to avoid dumping then full download on my disk.

To extract the desired data I need to examine the uncompressed stream line-by-line.

So I'm looking for the moral equivalent of

use PerlIO::gzip;

my $handle = open '<:gzip', 'ftp://ftp.foobar.com/path/to/blotto.txt.gz'
             or die $!;
for my $line (<$handle>) {
    # etc.
}
close($handle);

FWIW: I know how to open a read handle to ftp://ftp.foobar.com/path/to/blotto.txt.gz (with Net::FTP::repr), but I have not yet figured out how to add a :gzip layer to this open handle.


It took me a lot longer than it should have to find the answer to the question above, so I thought I'd post it for the next person who needs it.

kjo
  • 33,683
  • 52
  • 148
  • 265
  • 1
    Thank you for your question, it has made me think. It's a shame there is no `PerlIO::ftp` that works like Leon Timmermans' [`PerlIO::http`](https://metacpan.org/pod/PerlIO::http). Then you could just write `open '<:gzip:ftp', 'ftp://ftp.foobar.com/path/to/blotto.txt.gz'`. I may look at writing it, but I foresee huge problems with supporting more than a few of the FTP commands, especially as `PerlIO` has no concept of authorization – Borodin Apr 19 '14 at 19:26

2 Answers2

1

OK, the answer is (IMO) not at all obvious: binmode($handle, ':gzip').

Here's a fleshed-out example:

use strict;
use Net::FTP;
use PerlIO::gzip;

my $ftp = Net::FTP->new('ftp.foobar.com') or die $@;
$ftp->login or die $ftp->message;  # anonymous FTP
my $handle = $ftp->retr('/path/to/blotto.txt.gz') or die $ftp->message;

binmode($handle, ':gzip');

for my $line (<$handle>) {
    # etc.
}
close($handle);
kjo
  • 33,683
  • 52
  • 148
  • 265
1

The code below is from IO::Compress FAQ

use Net::FTP;
use IO::Uncompress::Gunzip qw(:all);

my $ftp = new Net::FTP ...

my $retr_fh = $ftp->retr($compressed_filename);
gunzip $retr_fh => $outFilename, AutoClose => 1
    or die "Cannot uncompress '$compressed_file': $GunzipError\n";

To get the data line by line, change it to this

use Net::FTP;
use IO::Uncompress::Gunzip qw(:all);

my $ftp = new Net::FTP ...

my $retr_fh = $ftp->retr($compressed_filename);
my $gunzip = new IO::Uncompress::Gunzip $retr_fh, AutoClose => 1
    or die "Cannot uncompress '$compressed_file': $GunzipError\n";

while(<$gunzip>)
{
    ...
}
pmqs
  • 3,066
  • 2
  • 13
  • 22