0

I'm new in perl and have a question concerning the use of hashes of arrays to retrieve specific columns. My code is the following:

my %hash = ( name1 => ['A', 'A', 'B', 'A', 'A', 'B'],
             name2 => ['A', 'A', 'D', 'A', 'A', 'B'],
             name3 => ['A', 'A', 'B', 'A', 'A', 'C'],
             );

#the values of %hash are returned as arrays not as string (as I want)

foreach my $name (sort keys %hash ) {
    print "$name: ";
    print "$hash{$name}[2]\n";
}

for (my $i=0; $i<$length; $i++) {
        my $diff = "no";
        my $letter = '';
        foreach $name (sort keys %hash) {
            if (defined $hash{$name}[$i]) {
                if ($hash{$name}[$i] =~ /[ABCD]/) {
                    $letter = $hash{$name}[$i];
                }
                elsif ($hash{$name}[$i] ne $letter) { 
                    $diff = "yes";
                }
            }
            if ( $diff eq "yes" ) {
                foreach $name (sort keys %hash) {
                    if (defined $hash{$name}[$i]) { $newhash{$name} .= $hash{$name}[$i]; }  
                }
            }
        }
    }
    foreach $name (sort keys %newhash ) {
        print "$name: $newhash{$name} \n";
    }

I want the output of this program to be something like a new hash with only the variable columns:

my %newhash = ( name1 => 'BB',
            name2 => 'DB',
            name3 => 'BC',
              );

but is only given this message: Use of uninitialized value $letter in string ne at test_hash.pl line 31.

Does anyone have ideas about this? Cheers

EDIT:

Many thanks for your help in this question.

I edited my post to confirm with the suggestions of frezik, Dan1111, Jean. You're right, now there are no warnings but I can not also get any output from the print statement and I don't have any clue about this...

@TLP: ok I just generate a random set of columns without any order purpose. What I really want is about how the letters vary, which means that if for the same array index (stored in the hash) the letters are the same, discard those, but if the letters are different between keys, I want to store that index column in a new hash.

Cheers.

PedroA
  • 1,803
  • 4
  • 27
  • 50
  • What is your definition of "the variable columns". It looks like you want all letters except A. Or possibly letters in the 3rd and 6th columns. Or every third letter. But from your code, I somehow get the impression that it's something about how the letters vary. So which is it? – TLP Oct 17 '12 at 15:10
  • Ok, so you want to compare the different arrays to each other, and if a column goes "AAA" or "BBB" skip it. However, then you have to save the letters first, then do the check. – TLP Oct 17 '12 at 15:30
  • Precisely, but I thought I was saving the letters in the variable $letter. However it seems that is not adding those to the new hash... – PedroA Oct 17 '12 at 15:39

6 Answers6

2

I assume that by this, you want to match any of the letters A,B,C, or D:

if ($hash{$name}[$i] =~ /ABCD/)

However, as written, it matches the exact string "ABCD". You need a character class for what you want:

if ($hash{$name}[$i] =~ /[ABCD]/)

However, you have other logic problems as well, that can lead you to compare to $letter before it has been set. Setting it to empty (as Jean suggested) is a simple option that may help.

Another problem is here:

print "$name: @{ $newhash{$name} }\n";

%newhash is not a hash of arrays, so you need to remove the array dereference:

print "$name: $newhash{$name} \n";
dan1111
  • 6,576
  • 2
  • 18
  • 29
2

You may be interested in this alternative solution

use strict;
use warnings;

my %hash = (
  name1 => ['A', 'A', 'B', 'A', 'A', 'B'],
  name2 => ['A', 'A', 'D', 'A', 'A', 'B'],
  name3 => ['A', 'A', 'B', 'A', 'A', 'C'],
);

my @columns;

for my $list (values %hash) {
  $columns[$_]{$list->[$_]}++ for 0 .. $#$list;
}

my %newhash = %hash;

for my $list (values %newhash) {
  $list = join '', map $list->[$_], grep keys %{$columns[$_]} > 1, 0 .. $#$list;
}

use Data::Dump;
dd \%newhash;

output

{ name1 => "BB", name2 => "DB", name3 => "BC" }
Borodin
  • 126,100
  • 9
  • 70
  • 144
1

Your scalar $letter is not defined. Add this to get rid of the warning.

my $letter='';
Jean
  • 21,665
  • 24
  • 69
  • 119
1
if ($hash{$name}[$i] =~ /ABCD/) {

The regex above would match a string like __ABCD__ or ABCD1234, but never a lone A or B. You probably wanted to match any one of those letters, and it's a good idea to anchor the regex, too:

if ($hash{$name}[$i] =~ /\A [ABCD] \z/x) {

(The /x option means that whitespace is ignored, which helps make regexes a bit easier to read.)

You would still get the warning in the example above when $i == 2 and the inner loop happens to hit the keys name1 or name3 first. Since the regex doesn't match T, $letter will remain uninitialized.

frezik
  • 2,316
  • 15
  • 13
1

I think it's a mistake to check the letters one by one. It seems easier to just collect all the letters and check them at once. The List::MoreUtils module's uniq function can then quickly determine if the letters vary, and they can be transposed into the resulting hash easily.

use strict;
use warnings;
use Data::Dumper;
use List::MoreUtils qw(uniq);

my %hash = ( name1 => ['A', 'A', 'B', 'A', 'A', 'B'],
             name2 => ['A', 'A', 'D', 'A', 'A', 'B'],
             name3 => ['A', 'A', 'B', 'A', 'A', 'C'],
);
my @keys = keys %hash;
my $len = $#{ $hash{$keys[0]} };   # max index
my %new;

for my $i (0 .. $len) {
    my @col;
    for my $key (@keys) {
        push @col, $hash{$key}[$i];
    }
    if (uniq(@col) != 1) {     # check for variation
        for (0 .. $#col) {
            $new{$keys[$_]} .= $col[$_];
        }
    }
}
print Dumper \%new;

Output:

$VAR1 = {
          'name2' => 'DB',
          'name1' => 'BB',
          'name3' => 'BC'
        };
TLP
  • 66,756
  • 10
  • 92
  • 149
0

Great. Many thanks for all your help in this question.

I tried a code based on the suggestion of TLP and worked just fine. Because I'm relatively new in perl I thought this code was more easier for me to understand than the code of Borodin. What I did was:

#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);

my %hash = ( name1 => ['A', 'A', 'T', 'A', 'A', 'T', 'N', 'd', 'd', 'D', 'C', 'T', 'T', 'T'],
         name2 => ['A', 'A', 'D', 'A', 'A', 'T', 'A', 'd', 'a', 'd', 'd', 'T', 'T', 'C'],
         name3 => ['A', 'A', 'T', 'A', 'A', 'C', 'A', 'd', 'd', 'D', 'C', 'T', 'C', 'T'],
);
my @keys = keys %hash;
my $len = $#{ $hash{$keys[0]} };   # max index
my %new;

for (my $i=0; $i<$length; $i++) {
    my @col;
    for my $key (@keys) {
       if ($hash{$key}[$i] =~ /[ABCDT]/) {     #added a pattern match
            push @col, $hash{$key}[$i];
       }
    }
    if (uniq(@col) != 1) {     # check for variation
        for (0 .. $#col) {
            $new{$keys[$_]} .= $col[$_];
        }
    }
}
foreach my $key (sort keys %new ) {
    print "$key: $new{$key}\n";
}

However, when playing with the uniq function (if (uniq(@col) == 1)), I noticed that the output was a little buggy:

name1: AAAAADCT
name2: AAAAADCT
name3: AAAAT

It seems that is not preserving the initial order of keys => values. Does anyone has a hint about this?

Cheers.

PedroA
  • 1,803
  • 4
  • 27
  • 50