2

I am new to Perl world, and I have a script that compares two arrays.

I use List::MoreUtils (each_arrayref) to do the comparison.

I have two questions:

1) Is there a way to compare two chunks of arrays (like natatime but for two arrayrefs) instead of comparing single element at a time as in each_arrayref?

The elements should be from the same index from each array.

The data structure is something like this:

{
  atr => [qw/ a b c d /],
  ats => [qw/ a b c d /],
  att => [qw/ a b c d /],
}

This is what I have got so far.

my @lists = keys %{$hash};

for (my $i = 0; $i <= @lists; $i++) {

  my $list_one = $lists[$i];
  my $one = $hash->{$list_one};

  for (my $j = 0 ; $j <= @lists ; $j++) {

    my $list_two = $lists[$j];
    my $two = $hash->{$list_two};

    my ($overlapping, $mismatch, $identity);
    my $match          = 0;
    my $non_match      = 0;
    my $count_ac_calls = 0;
    my $each_array     = each_arrayref($one, $two);

    while (my ($call_one, $call_two) = $each_array->()) {

      if ((defined $call_one) && (defined $call_two)) {
        if ($call_one eq $call_two) {
          $match++;
        }
        if ($call_one ne $call_two) {
          $non_match++;
        }
      }
    }    #end of while loop $each_array->()

    print "$list_one,$list_two,$match,$non_match";

  }    #end of for j loop
}    #end of for i loop

I would like to compare atr->ats, atr->att, ats->att. But with my current code, I get repetitions of comparison like ats->atr att->atr,att->ats.

2) How can I avoid those?

Borodin
  • 126,100
  • 9
  • 70
  • 144
user1958532
  • 113
  • 1
  • 13

3 Answers3

4

I'm not clear what your first question means. Do you want an iterator that, say, returns (('a','b','c'),('a','b','c')) instead of ('a','a')? If so then there isn't one available in a library, but it wouldn't be hard to write your own.

As for the second, the usual way to avoid items being compared with themselves is to change the inner loop to start after the current value of the first. Like so

for my $i (0..$#lists) {

  for my $j ($i+1..$#lists) {

  }

}

This works because A eq B is generally the same as B eq A, so there is no point in comparing an entry with one earlier in the list because the inverse comparison has already been made.

Note that it is much better Perl to write for loops this way than the messy C-style syntax. You also have a couple of bugs in

for (my $i = 0 ; $i <= @lists ; $i++) { ... }

because the maximum index of @lists is one less than the scalar value of @lists - usually coded as $#lists. The same problem exists in your loop for $j.

Update

Here is a refactoring of your program, written to include the ideas I have described and to be more Perlish. I hope it is useful to you.

use strict;
use warnings;

use List::MoreUtils 'each_arrayref';

my $hash = {
  atr => [qw/ a b c d /],
  ats => [qw/ a b c d /],
  att => [qw/ a b c d /],
};

my @keys = keys %{$hash};

for my $i (0 .. $#keys) {

  my $key1 = $keys[$i];
  my $list1 = $hash->{$key1};

  for my $j ($i+1 .. $#keys) {

    my $key2 = $keys[$j];
    my $list2 = $hash->{$key2};

    my ($match, $non_match) = (0, 0);
    my $iter = each_arrayref($list1, $list2);

    while (my ($call1, $call2) = $iter->()) {
      if (defined $call1 and defined $call2) {
        ($call1 eq $call2 ? $match : $non_match)++;
      }
    }

    print "$key1, $key2, $match, $non_match\n";
  }
}
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • 2
    `<= $#lists` or `< @lists` – ysth Jan 08 '13 at 17:12
  • Thanks Borodin. This looks more neat than what I have got. And regarding my first question, I would like to compare something like this abcd of atr to abcd of ats rather than a->a, b->b,c->c,d->d. – user1958532 Jan 09 '13 at 10:42
1

One option is to use Array::Compare to return the number of different array elements. Also, Math::Combinatorics is used to obtain only unique comparisons.

use strict;
use warnings;
use Array::Compare;
use Math::Combinatorics;

my %hash = (
    'atr' => [ 'a', 'b', 'c', 'd' ],
    'ats' => [ 'a', 'b', 'c', 'd' ],
    'att' => [ 'a', 'c', 'c', 'd' ],
);

my $comp = Array::Compare->new( DefFull => 1 );
my $combinat = Math::Combinatorics->new(
    count => 2,
    data  => [ keys %hash ],
);

while ( my ($key1, $key2) = $combinat->next_combination ) {
    my $diff = $comp->compare( \@{ $hash{$key1} }, \@{ $hash{$key2} } );
    print "$key1,$key2," . ( @{ $hash{$key1} } - $diff ) . ",$diff\n";
}

Output:

ats,att,3,1
ats,atr,4,0
att,atr,3,1
Kenosis
  • 6,196
  • 1
  • 16
  • 16
  • This will work only in the unlikely event that the real data is all single-character strings. – Borodin Jan 08 '13 at 23:20
  • @Borodin - Yes, excellent point. I may have take the OP's "The data structure is something like this" too literally. Originally used Array::Compare; will reinstate it using a full compare. Thanks. – Kenosis Jan 09 '13 at 01:10
-1

You're not really taking advantage of the features Perl has to offer. Rather than use an error prone C-style loop, just use for my $var (LIST). You can also skip redundant list checking by skipping the self-checks, too. I've taken your script, made some alterations, and I'm sure you'll find it a bit easier to read.

use v5.16;
use warnings;
use List::MoreUtils qw{each_arrayref};

my $hash = {
  'atr' => [
    'a',
    'b',
    'c',
    'd'
   ],
  'ats'=>[
    'a',
    'b',
    'c',
    'd'
   ],
  'att' => [
    'a',
    'c',
    'c',
    'd'
   ],
};

for my $list_one (keys $hash) {
    my $one = $hash->{$list_one};

    for my $list_two (keys $hash) {
        next if $list_one ~~ $list_two;

        my $two = $hash->{$list_two};

        my ($match, $non_match);
        $match = $non_match = 0;

        my $each_array = each_arrayref($one, $two);
        while (my ($call_one, $call_two) = $each_array->()) {
            if($call_one && $call_two) {
                if($call_one eq $call_two) {
                    $match++;
                }
                else {
                    $non_match++;
                }
            }
        }

        print "$list_one,$list_two,$match,$non_match\n";
    }
}

You'll want to evaluate one at a time anyway so that you can add in some extra bits like the index location. (Yes, you could use the C-style loop, but that'd be a bit more difficult to read.)

titanofold
  • 2,852
  • 1
  • 15
  • 21
  • `while (my ($list_one, $one) = each %$hash){...}` is even better. – amon Jan 08 '13 at 18:49
  • @amon: That isn't working for me. The script gets stuck in an infinite loop. – titanofold Jan 08 '13 at 19:08
  • Ah, sorry, yes, I forgot that there can only be one iterator per data structure :/ – amon Jan 08 '13 at 19:13
  • Yup, was just about to say I know why thanks to the friendly manual: http://perldoc.perl.org/functions/each.html – titanofold Jan 08 '13 at 19:14
  • 2
    This code compares `atr` with `ats` as well as `ats` with `atr` etc. There is no need for `use v5.16` as there is nothing in the program that v5.8 won't handle. There is no reason to use `$list_one ~~ $list_two` over `$list_one eq $list_two`. And `$call_one && $call_two` is not a valid substitute for `defined $call_one && defined $call_two` if any of the data can be `'0'` or `''`. – Borodin Jan 08 '13 at 23:27
  • Hi Again, Main reason for my question is to increase the efficiency of the code.For smaller data set it works wonderfully. But when I have more than like 50 sets that it's painfully slow. Thanks once again for all your wonderful suggestions. – user1958532 Jan 09 '13 at 11:10