1

I'm trying to export aligned sequences to a fasta file one by one using Bio::SeqIO. The result is that the sequences are broken by a new line every 60 columns. How do I avoid that?
I'd like to have the sequences exported in a 'wide' format, i.e. no line breaks in the sequence.

My code is roughly:

use Bio::SeqIO;
my $seqin = Bio::SeqIO->new(-file => "<$fastaFile", '-format' => 'Fasta');
my $outname = fileparse($fastaFile, qr/\.[^\.]*$/) . "_sub.fasta";
my $seqout = Bio::SeqIO->new(-file => ">$outname", '-format' => 'Fasta');

while(my $seq = $seqin->next_seq){
      # do something with $seq
      $seqout->write_seq($seq);
}
Roey Angel
  • 2,055
  • 2
  • 15
  • 9
  • Silly idea which doesn't really answer the question. But if this is just something you're only planning on doing once or a few times, you could probably throw together a one liner to remove the newlines. Something like, remove all newlines from lines which don't begin with ">". Then you could replace ">" with "\n>" to get the newline back before each header. – Memento Mori Feb 23 '13 at 10:49
  • yep, that's pretty much what I ended up doing in the end. Just thought there might be a more elegant way. – Roey Angel Feb 24 '13 at 08:35

1 Answers1

2

Bio::SeqIO::fasta provides a width method to specify how written FASTA records should be formatted:

while (my $seq = $seqin->next_seq) {
    $seqout->width($seq->length);
    $seqout->write_seq($seq);
}

Or of course if your sequences have some maximum size, you can just put a single

$seqout->width(5000);

or so in before the loop.

John Marshall
  • 6,815
  • 1
  • 28
  • 38