I am trying to divide a big file into different files containing single information for each variable inside the file.
my input file look like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PID008SM
...info here 1.....
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CL001-SC
....info here 2....
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CL001-SC
....info here 3....
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PID008SM
....info here 4....
In this case I would like to create two output file (one for PID008SM and CL001-SC) with the information related to each of them.
Output for CL001-SC:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CL001-SC
....info here 2...
....info here 3...
Output for PID008SM
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PID008SM
....info here 1....
....info here 4....
The script that I have used is in Perl but any suggestion it is more than welcome. Thank you in advance.
code:
#!/usr/bin/perl;
use strict;
use warnings;
my $file1 = $ARGV[0] ;
my $file2 = $ARGV[1];
open (F1, $file1); #Opens first .vcf file for comparison
open (F2, $file2); #2nd for comparison
my %file;
## Create the hash key with each line of the file2
while (<F2> ) {
#chomp;
$file{$_}='';
}
## Print the line , if key exist in the hash ;
foreach my $string (<F1>) {
if ( exists $file{$_}) and ($string =~ /(#)(.+?)(#)/s) {
print $string;
}
}