I want to write a subroutine that takes a FASTA file as an argument and prints out the sequence (without the header). The subroutine should check if the sequence contains any other letters than DNA bases (A, T, G, C).
Here's my code:
scalar_sequence ("sequence.fa");
sub scalar_sequence {
my $file = $_[0];
my $sequence;
open (READ, $file) || die "Cannot open $file: $!.\n";
while (<READ>){
if (/^>/){
next;
}
if (/^[ATCG]/){
$sequence .= $_;
} else {
die "invalid sequence\n";
}
}
print $sequence, "\n";
}
When I run this code, I get 'invalid sequence' as output. When I leave the 'else' out, it prints out the sequence even when the sequence contains another letter.
What's the problem?
Thanks in advance!