1

many thanks for the help with the earlier issues.

I've almost finished the last thing I was working on - specifically an ORF (open reading frame) finder program. So far, I've got an array called @AminoAcidArray1. All the start codons are "_" and all the stop codons are "X".

How do I count the ORFs? Put another way, How do I count times in the array when "_" is followed by "X" with random ignorable characters between? What sort of loop should I be using? I need a ~= there somewhere I think

And yes, I know bioPerl can do this easily, but only activePerl is available for some reason.

Sincerest thanks, Shtanto

Shtanto
  • 39
  • 4
  • can you please clarify what do you want to achieve? I have no idea what are you asking for. – Karel Bílek May 13 '12 at 20:56
  • Sure. The array @AminoAcidArray1 has occurances of the characters "_" and "X". I want to count the number of times "_" followed by "X" is found, ignoring whatever might be in between. So, whenever the array has an underscore character in it, an open reading frame starts. Whenever that underscore is followed by an X in the array, the open reading frame stops. – Shtanto May 13 '12 at 21:54
  • Do you need to know where the ORFs are or just how many there are? – flies May 14 '12 at 19:07

3 Answers3

2

First, contemporary ActivePerl has Bundle::BioPerl in its main 'Activeperl' repository. This should allow a BioPerl installation on some ActivePerl versions.

Then,

print  "$-[0]..$+[0]\n" while $orf =~ /_[^X]*X/g;

prints start- (_) and stop (X) index of your orfs contained in $orf if they are consecutive (not nested). If nested, then you'd have to use slightly more complicated expressions (with recursion).

BTW.: What does the expression

print join ',', @AminoAcidArray1;

print on your console?

rbo

rubber boots
  • 14,924
  • 5
  • 33
  • 44
  • print "$-[0]..$+[0]\n" while $orf =~ /_[^X]*X/g; Thank you, that'd be the regular expression I was after. Strangely, it doesn't print anything out though. I must be using it with the wrong variable. That second line makes the output look groovy, all the letters are seperated with commas now. Very nice :) – Shtanto May 14 '12 at 00:15
  • @Shtanto - "_Strangely, it doesn't print anything out though_" -- then you'd have to **insert the line** `my $orf = join '', @AminoAcidArray1;` **before** the said line. – rubber boots May 14 '12 at 09:09
1

If I understand it right from your comment:

you have an array, you don't need =~ operator.

You need to traverse the array once and remember the current state of what you call "reading window". Say:

my $state = 0;
my $count = 0;
for my $item (@array) {
 if ($item eq "_") {
    if ($state==0) {
       $state=1;
    }
 } elsif ($item eq "X") {
     if ($state==1) {
       $state=0;
       $count++;
    }
 }
}

return $count;
Karel Bílek
  • 36,467
  • 31
  • 94
  • 149
  • Thanks Karel. Code like this is what students dream of finding. I tweaked in my own particular array name, swapped return for a common print statement and bingo, all done. The great thing about this is that it'll work in so many different contexts. Array traversal is a common enough program requirement. I know how to read the program logic on screen based on my knowledge of Java, but writing it myself is still quite tricky. Bit like my German :) Sorry about being so domain specific. Your help is greatly appreciated. – Shtanto May 14 '12 at 00:33
0

Your question is too specific to your domain, but what I understand is that you want to count some occurrences in an array, this is what I does in the following code (I use perlconsole) :

Perl> my @a = qw/az ae ar at ay au pr lf/
8

Perl> my $count = grep /^a/, @a
6

Perl> print "$count\n"
6
1

Perl> 
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223