3

I am attempting to get a percentage of an edit distance from a group of sequences. So far this is what I have:

#!/usr/bin/perl -w
use strict;
use Text::Levenshtein qw(distance);

my @sequence = qw(CA--------W----------------------EKDRRTEAF---F------ 
CA--------W----------------------EKDRRTEAF---F------ 
CA--------S-------------------SLVFGQGDNIQY---F------  
RA--------S-------------------SLIYSP----LH---F------);


foreach my $list (@sequence){
    my @distance = distance($list, @sequence);
    my @length = $list =~ tr/[A-Z]///;
}

I am able to get the edit distance with @distance and the length of each sequence, based on the letters with @length. If printed the results are as follows:

@distance

0 0 13 14
0 0 13 14
13 13 0 11 
14 14 11 0

@length

13
13
16
12

As each line of @length is equivalent to each line of @sequence, when comparing the two lines I would like to use the largest @length to get the percentage. So as when having an edit distance between the second and third sequence it would use the length of 16 rather than 13 to get a percentage. What I think needs to happen is to call only two elements of the @length array and pick the larger one to then put into a percentage, possibly using an if statement.

I know this code is wrong, but it is generally the idea I am going for:

foreach my  $list (@sequence){
        my @distance = distance($list, @sequence);      
        my @length = $list =~ tr/[A-Z]//;                # / syntax hilite fix

        foreach my $item(@distance){
                foreach @length {
                        my $num1 = if $length[0] >= $length[1];
                                 print "$item/$num1\n";
                        else my $num2 = $length[1] >= $length[0];
                                print "$item/$num2\n";
                }
        }
}

The answer should look something similar to that below:

0  0 .8125  1.0769
0  0  .8125  1.0769
.8125  .8125  0  .6875
1.0769  1.0769  .6875  0
zx8754
  • 52,746
  • 12
  • 114
  • 209
El David
  • 375
  • 2
  • 3
  • 11

1 Answers1

3

Try this. To summarize: We compute the edit distances between pairs of strings. For each pair we want to determine the fraction of the distance and the maximum number of characters (A-Z). The maximum number of characters is taken to be the maximum for the two items in the pair.

use strict;
use warnings;

use Text::Levenshtein qw(distance);

my @sequence = qw(
        CA--------W----------------------EKDRRTEAF---F------
        CA--------W----------------------EKDRRTEAF---F------
        CA--------S-------------------SLVFGQGDNIQY---F------
        RA--------S-------------------SLIYSP----LH---F------
);

my @length = map { tr/[A-Z]// } @sequence;

for my $i (0..$#sequence) {
    my $list = $sequence[$i];
    my @distance = distance($list, @sequence);
    my $num1 = $length[$i];
    for my $j (0..$#distance) {
        my $item = $distance[$j];
        my $num2 = $length[$j];
        my $num = ( $num2 > $num1 ) ? $num2 : $num1;
        printf "%.4f ", $item/$num;
    }
    print "\n";
}

Output:

0.0000 0.0000 0.8125 1.0769 
0.0000 0.0000 0.8125 1.0769 
0.8125 0.8125 0.0000 0.6875 
1.0769 1.0769 0.6875 0.0000 
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • Nice work. Btw, adding a comment like `# /` at the line with regex will turn off the wrong syntax highlight for the rest of the post. Sometimes you have to play with it, depending on what in regex sets it off. Here I'd expect `# /` to do it. – zdim Dec 22 '16 at 23:07
  • @zdim Thanks. [Seems like](http://meta.stackexchange.com/questions/184108/what-is-syntax-highlighting-and-how-does-it-work) stackexchange.com uses [Google's code-prettify](https://github.com/google/code-prettify). If you like, we could try fix some issues with Perl syntax highlighting on SO (seems like we need to write our patches in Javascript though :)) Have a nice Christmas from Norway. – Håkon Hægland Dec 24 '16 at 16:58
  • Thank you for well wishes. I may have got a glimpse of Norway on an Oregon mountain, with feet of snow and lots of sunshine. (But I somehow think that 20F was way too warm?) I hope you had a good holiday. – zdim Dec 29 '16 at 10:18
  • That is a very good idea, to actually fix things! (I don't know how one goes about doing that.) As for JS, it would be an opportunity for me to finally get into it. – zdim Dec 29 '16 at 10:20
  • @zdim Hi zdim. Actually the climate in Norway is quite varied. I live in Bergen at the west coast, and its not so cold here :) I have never been in Oregon, but I visited California 15 years ago... Anyway, I created a fork of Google's code-prettify GitHub repository. So let's continue the discussion [in it's issue tracker](https://github.com/hakonhagland/code-prettify/issues/1). See you. – Håkon Hægland Dec 31 '16 at 19:18