0

I need to write a perl script to read gzipped files from a text file list of their paths and then concatenate them together and output to a new gzipped file. ( I need to do this in perl as it will be implemented in a pipeline) I am not sure how to accomplish the zcat and concatenation part, as the file sizes would be in Gbs, I need to take care of the storage and run time as well.

So far I can think of it as -

use strict;
use warnings;
use IO::Compress::Gzip qw(gzip $GzipError) ;

#-------check the input file specified-------------#

$num_args = $#ARGV + 1;
if ($num_args != 1) {
    print "\nUsage: name.pl Filelist.txt \n";
exit;

$file_list = $ARGV[0];

#-------------Read the file into arrray-------------#

my @fastqc_files;   #Array that contains gzipped files 
use File::Slurp;
my @fastqc_files = $file_list;


#-------use the zcat over the array contents 
my $outputfile = "combined.txt"
open(my $combined_file, '>', $outputfile) or die "Could not open file '$outputfile' $!";

for my $fastqc_file (@fastqc_files) {

    open(IN, sprintf("zcat %s |", $fastqc_file)) 
      or die("Can't open pipe from command 'zcat $fastqc_file' : $!\n");
    while (<IN>) {
        while ( my $line = IN ) {
          print $outputfile $line ;
        }
    }
    close(IN);

my $Final_combied_zip = new IO::Compress::Gzip($combined_file);
  or die "gzip failed: $GzipError\n";

Somehow I am not able to get it to run. Also if anyone can guide on the correct way to output this zipped file.

Thanks!

AnkP
  • 631
  • 2
  • 9
  • 18
  • Have you tried `zcat file1 file2 file3 ... filen | gzip > out.gz` (untested)? – Sinan Ünür Dec 01 '15 at 15:42
  • Have you tried anything yet? Because some attempt at doing so is definitely going to elicit better answers. There are modules to do this without too much difficulty. Or there's `open` of an exec pipe – Sobrique Dec 01 '15 at 15:53
  • @Sobrique I tried to use zcat to read the gzip file but I am not sure if I can simply concatenate it with each gzip file it reads in from list – AnkP Dec 01 '15 at 22:06

2 Answers2

1

You don't need perl for this. You don't even need zcat/gzip as gzipped files are catable:

cat $(cat pathfile) >resultfile

But if you really really need to try to get the extra compression by combining:

zcat $(cat pathfile)|gzip >resultfile

Adding: Also note the very first "related" link on the right, which seems to already answer this very question: How to concat two or more gzip files/streams

Community
  • 1
  • 1
Jeff Y
  • 2,437
  • 1
  • 11
  • 18
  • Wouldn't the headers and trailer (CRC32 checksum) get mixed up if you just cat a bunch of gzip files into one gzip file? Edit: It will, but gunzip will still be able to unzip the one large gzip file correctly. – simon Dec 01 '15 at 15:54
  • Not according to the link at the linked question: `The gzip manual says that two gzip files can be concatenated as you attempted.` http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage – Jeff Y Dec 01 '15 at 15:55
  • I need to further read the combined file in a program which would treat the that combined zip file as one single file, so I just cannot concatenate the zip files – AnkP Dec 01 '15 at 22:07
1

Thanks for the replies - the script runs well now -

#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp;
use IO::Compress::Gzip qw(gzip $GzipError);


my @data = read_file('./File_list.txt');
my $out = "./test.txt";


foreach my $data_file (@data)

{
    chomp($data_file);
    system("zcat $data_file >> $out");
}
my $outzip = "./test.gz";
gzip $out => $outzip;
AnkP
  • 631
  • 2
  • 9
  • 18