1

I want to write a subroutine that takes a FASTA file as an argument and prints out the sequence (without the header). The subroutine should check if the sequence contains any other letters than DNA bases (A, T, G, C).

Here's my code:

scalar_sequence ("sequence.fa");

sub scalar_sequence {
    my $file = $_[0];
    my $sequence;
    open (READ, $file) || die "Cannot open $file: $!.\n";
    while (<READ>){
        if (/^>/){
            next;
        } 
        if (/^[ATCG]/){
            $sequence .= $_;
        } else {
            die "invalid sequence\n";
        }
    }
    print $sequence, "\n";
}

When I run this code, I get 'invalid sequence' as output. When I leave the 'else' out, it prints out the sequence even when the sequence contains another letter.

What's the problem?

Thanks in advance!

ic23oluk
  • 125
  • 1
  • 9

1 Answers1

1

The problem is here /^[ATCG]/ this line should be /^[ATCG]+$/

Your code should be

chomp;  
next if (/^>/); # skip for header
next if(/^\s*$/);  #skip for empty line
if (/^[ATCG]+$/){
        $sequence .= $_;
    } else {
        die "invalid sequence\n";
    }

You are only consider the beginning of the line start wit A or T or G or C. You should expand the matches.

mkHun
  • 5,891
  • 8
  • 38
  • 85
  • doesn't work either, prints out 'invalid sequence' even when sequence is valid – ic23oluk Jun 14 '17 at 09:49
  • 2
    @ic23oluk Because of new line. Add `chomp` in your script. – mkHun Jun 14 '17 at 09:51
  • @ic23oluk In every fasta file last line is a new line. Please remove die from your `else` condition. And print the `$sequence`. – mkHun Jun 14 '17 at 10:03
  • that's not exactly what i want: If i leave out 'die' it prints the sequence even when it harbors invalid letters, but without the lines in which these invalid letters are. I want to write an algorithm that stops when one or more invalid letters are included and return an error message – ic23oluk Jun 14 '17 at 10:18
  • When removing the last newline in the fasta file it works as desired, but how can I do this automatically? – ic23oluk Jun 14 '17 at 10:43