-1

my question is very simple: i have a database that is looking like this: enter image description here

My goal is just to eliminate the newline \n at the end of every sequence line, NOT OF THE HEADER, i tried the following code

#!/usr/bin/perl
use strict;
my $db = shift;
my $outfile= "Silva_chomped_for_R_fin.fasta";
my $header;
my $seq;
my $kick = ">";

open(FASTAFILE, $db);
open(OUTFILE,">". $outfile);

while(<FASTAFILE>) {
    my $currentline = $_;
    chomp $currentline;
    if ($currentline =~ m/^$kick/) {
        $header = $currentline;
    } else {
        chomp $currentline;
        $seq = $currentline;
    }
    my $path = $header.$seq."\n";
    print(OUTFILE $path);
}

close OUTFILE;
close FASTAFILE;
exit;

But instead of having just the sequence line chomped i obtain the followingenter image description here

like if chomp didn't work at all.. any idea of what i do wrong? thanks a lot Alfredo

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • 3
    Instead trying reinventing the wheel again an again, you should use some already developed modules for reading FASTA files ([for example this one](https://metacpan.org/pod/FAST::Bio::SeqIO) )and you could put more efforts for the real problem solving... – clt60 Jan 06 '18 at 12:36
  • 5
    Please don't post images of text. That's useless if we want to cut and paste it in order to try and solve your problem. – Dave Cross Jan 06 '18 at 16:07
  • Thanks a lot @jm666 for the hint, actually i should start use this FAST::Bio. – Alfredo Mari Jan 08 '18 at 10:29
  • @DaveCross, sorry about that, i thought my issue was stupid enough that already with the image one could immediately spot the error. I will try not to do it. – Alfredo Mari Jan 08 '18 at 10:32

2 Answers2

2

There are three issues with your while() loop.

  • You are chomp()'ing unconditionally at the beginning of the loop.
  • You are then re-adding the newline character at the end of the loop (defeating the purpose of the previous chomp()).
  • You are concatenating the header to every line.

Here is a simplified version.

use strict;
use warnings;

my $db = shift;
my $outfile = "out.fasta";

open(my $fh, "<", $db) or die "Could not open input file";
open(my $out, ">", $outfile) or die "Could not open output file";

my $header;

while (<$fh>) {
    $header = /^>/;
    chomp unless $header;
    print $out $. > 1 && $header && "\n", $_;
}

close $out;
close $fh;

The line

print $out $. > 1 && $header && "\n", $_;

will conditionally prepend a newline to the output if this line begins with a '>' - unless it is the first line in the file. (The $. variable is the current linenumber.)

Credit: ikegami spotted the failure in my original code to allow for more than one sequence within the input database.

David Collins
  • 2,852
  • 9
  • 13
1
my $add_lf = 0;
while (<>) {
   chomp;
   if (/^>/) {
      print("\n") if $add_lf;
      print("$_\n");
      $add_lf = 0;
   } else {
      print;
      $add_lf = 1;
   }
}

print("\n") if $add_lf;
ikegami
  • 367,544
  • 15
  • 269
  • 518