There's no need for your longestRepeat
function to check which of the three cases it's handling -- in general, when you find you've written exactly the same instructions multiple times, it's a hint that you can factor out the repetition and simplify your program thereby. Consider the following, which I've cleaned up for functionality and commented for illustrative purposes:
#!/usr/bin/env perl
use warnings;
use strict;
# no need to declare and define separately; this works fine
# also no need for space before semicolon
my $DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT";
print "$DNA\n";
print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n";
print "The longest TAGA repeat is " . longestRepeat($DNA, "TAGA") . "\n";
print "The longest C repeat is " . longestRepeat($DNA, "C") . "\n";
sub longestRepeat {
# note that, within a function, @_ is the default argument to shift();
# hence its absence in the next two lines. (in practice, you're more
# likely to see 'shift' in this context without even parentheses, much
# less the full 'shift(@_)'; be prepared to run into it.)
my $sequence = shift(); # take the first argument
my $kmer = shift(); # take the second argument
# these state variables we'll use to keep track of what we're doing here;
# $longest_match, a string, will eventually be returned.
my $longest_matchlen = 0;
my $longest_match = '';
# for each match in $sequence of one or more $kmer repeats...
while ($sequence =~ m@($kmer)+@g) {
# ...get the length of the match, stored in $1 by the parenthesized
# capture group, with the '+' quantifier grabbing the longest match
# available from each starting point (see `man perlre' for more)...
my $this_matchlen = length($1);
# ...and if this match is longer than the longest yet found...
if ($this_matchlen > $longest_matchlen) {
# ...store this match's length in $longest_matchlen...
$longest_matchlen = $this_matchlen;
# ...and store the match itself in $longest_match.
$longest_match = $1;
}; # end of the 'if' statement
}; # end of the 'while' loop
# at this point, the longest match we found is in $longest_match; if
# we found no matches, then $longest_match still contains the empty
# string we assigned up there before the while loop started, which is
# the correct result in a case where $kmer never appears in $sequence.
return $longest_match;
};
You're studying bioinformatics, aren't you? I have some experience of teaching Perl to bioinformaticians, and I gather there is an extremely broad distribution of programming skill and talent in the field, with a rather unfortunate hump toward the left-hand side of the graph -- a polite way of saying that, as a professional programmer, most of the bioinformatics Perl code I've seen has ranged from not very good to quite poor indeed.
I mention this with no intent to give insult, but only to substantiate my very strong recommendation that you include some computer science courses in whatever curriculum you're currently pursuing; the more exposure you can give yourself to the general concepts and habits of thinking which are involved in the accurate formulation of algorithms, the more prepared you'll be to tackle the requirements imposed by your field -- indeed, more prepared than most, in my experience; while I'm not a bioinformatician myself, in working with people who are, it seems to me that a strong programming background may well be more useful to a bioinformatician than a strong background in biology.