0

I have the following dataset :

$name  $id   $value ##


A       abc     2.1
A       pqr     5.9
A       xyz     5.6
B       twg     2.5
B       ysc     4.7
C       faa     4.7
C       bar     2.4
D       foo     1.2
D       kar     0.3
D       tar     3.5
D       zyy     0.1

For each $name, I need to extract $id with highest $value. I tried something like this:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);

my $infile;
my %multi_hash;
open ($infile, "test.txt") || die "can't open $infile\n";

while (my$line=<$infile>) {
     my($name,$id,$val)= split(/\t/, $line);
     $multi_hash{$name}{$id}=$val;
   }

# print Dumper \%multi_hash;
    foreach my $name_1(sort keys %multi_hash){
            foreach my $id_1 (keys %{$multi_hash{$name_1}}) {
                    print "$name_1\t$id_1\t$multi_hash{$name_1}{$id_1}";
            }
    }

I want the output as :

 A  pqr  5.9
 B  ysc  4.7
 C  faa  4.7
 D  tar  3.5

What I am able to print is something which is already in the input file.

Could anyone help with improving my program?

toolic
  • 57,801
  • 17
  • 75
  • 117
harsh
  • 79
  • 1
  • 7

2 Answers2

0

Using command line,

perl -lane'
  $_->{m}<$F[2] and @$_{"s","m"} = @F[1,2] for $h{$F[0]};
END {
  print join" ", $_, @{$h{$_}}{"s","m"} for sort keys %h
}
' file

output

A pqr 5.9
B ysc 4.7
C faa 4.7
D tar 3.5

script equivalent:

local $\ = "\n"; # adds newline to print statements
my %h;
while (<>) {
  chomp;
  my @F = split ' ', $_;       # split columns on white spaces
  for my $r ($h{$F[0]}) {      # from now on, use $r as reference to $h{$F[0]}
    if ($r->{m} < $F[2]) {
      $r->{s} = $F[1];
      $r->{m} = $F[2];
    }
  }

}

for my $k (sort keys %h) {
  my $s = $h{$k}{s};
  my $m = $h{$k}{m};
  print join " ", $k, $s, $m;
}
mpapec
  • 50,217
  • 8
  • 67
  • 127
0

perldoc -q sort to sort a hash by value.

use warnings;
use strict;

my %multi_hash;
while (<DATA>) {
    my ($name,$id,$val) = split;
    $multi_hash{$name}{$id} = $val;
}

for my $name_1 (sort keys %multi_hash) {
    my %h = %{ $multi_hash{$name_1} };
    my $key = (reverse sort { $h{$a} <=> $h{$b} } keys %h)[0];
    print "$name_1\t$key\t$multi_hash{$name_1}{$key}\n";
}

__DATA__
A       abc     2.1
A       pqr     5.9
A       xyz     5.6
B       twg     2.5
B       ysc     4.7
C       faa     4.7
C       bar     2.4
D       foo     1.2
D       kar     0.3
D       tar     3.5
D       zyy     0.1

Or, without the intermediate hash:

for my $name_1 (sort keys %multi_hash) {
    my $key = (reverse sort { $multi_hash{$name_1}{$a} <=> $multi_hash{$name_1}{$b} } keys %{ $multi_hash{$name_1} })[0];
    print "$name_1\t$key\t$multi_hash{$name_1}{$key}\n";
}
toolic
  • 57,801
  • 17
  • 75
  • 117