I am a beginner with bioinformatics and I have been working on a little Bio Perl code to split my paired end MiSeq data (currently in 1 fastq file) into 2 files, each file containing one end of the pair. The different ends of the paired end reads can be distinguished by a 1 or a 2 after the space in the fastq header. The file follows a typical fastq format, example from using "head" in the command line:
@M00763:6:000000000-A1U80:1:1101:12620:1732 1:N:0:1
TTATACTC
+
@A@AA@A@
@M00763:6:000000000-A1U80:1:1101:12620:1732 2:N:0:1
T
+
E
I have written a code trying to target the 1 or 2 in the header using a match. Although I am using Bio::SeqIO perl does not seem to be recognizing the fastq format, and I keep getting this error:
MSG: Could not guess format from file/fh
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.12.3/Bio/Root/Root.pm:472
STACK: Bio::SeqIO::new /sw/lib/perl5/5.12.3/Bio/SeqIO.pm:389
STACK: SplitPairedEndReads.pl:7
Can someone help me find/fix my error? The information available from BioPerl website indicates that Bio::SeqIO should be able to recognize fastq format.
Here is the code I have written:
#!/usr/bin/perl
use Bio::SeqIO;
use Bio::SeqIO::fastq;
$seqout1 = Bio::SeqIO->new(-file => ">peread1.fastq" -format => "fastq",);
$seqout2 = Bio::SeqIO->new(-file => ">peread2.fastq" -format => "fastq",);
$seqio_obj = Bio::SeqIO->new(-file => "AIS351_Strin1edit.fastq", -format => "fastq",
-alphabet => "dna" );
$seq_obj = $seqio_obj->next_seq;
while ($seq_obj = $seqio_obj->next_seq) {
$name = $seq_obj->desc; if($name=~ / 1:/) {$seqout1->write_seq($seq_obj);
} else { $seqout2->write_seq($seq_obj);
}
}
Thanks for your help and your patience with my beginner knowledge.
~Al