What is the easiest way to get a key with the highest value from a hash in Perl?

Question

Eric Strom · Accepted Answer · 2010-05-22T22:07:33.757

While the solution with sort:

(sort {$hash{$a} <=> $hash{$b}} keys %hash)[0]

found in some of the other answers is quite elegant, it doesn't perform as nicely as it looks. First off, the sort transforms an O(n) search search operation into an O(n log n) one. Secondly, the sort solution has n log n hash look-ups. Hash look-ups are very good for certain operations, but when working with the entire hash, look-ups will be slower than using each, keys, or values to iterate through the data structure. This is because the iterators do not need to calculate the hashes of keys, nor do they need to repeatedly walk through bins to find the values. And the overhead is not constant, but increasing as the hashes get larger.

Here are a few faster solutions:

use strict;
use warnings;

my %hash = (
    small   => 1,
    medium  => 5,
    largest => 10,
    large   => 8,
    tiny    => 0.1,
);

Here is a solution using the each iterator (an O(1) operation done n times):

sub largest_value (\%) {
    my $hash = shift;
    keys %$hash;       # reset the each iterator

    my ($large_key, $large_val) = each %$hash;

    while (my ($key, $val) = each %$hash) {
        if ($val > $large_val) {
            $large_val = $val;
            $large_key = $key;
        }
    }
    $large_key
}

print largest_value %hash; # prints 'largest'

Or a faster version that trades memory for speed (it makes a copy of the hash):

sub largest_value_mem (\%) {
    my $hash   = shift;
    my ($key, @keys) = keys   %$hash;
    my ($big, @vals) = values %$hash;

    for (0 .. $#keys) {
        if ($vals[$_] > $big) {
            $big = $vals[$_];
            $key = $keys[$_];
        }
    }
    $key
}

print largest_value_mem %hash; # prints 'largest'

Here is the performance with various hash sizes:

10 keys:              Rate largest_with_sort largest_value largest_value_mem
largest_with_sort 111565/s                --           -8%              -13%
largest_value     121743/s                9%            --               -5%
largest_value_mem 127783/s               15%            5%                --

50 keys:             Rate  largest_with_sort largest_value largest_value_mem
largest_with_sort 24912/s                 --          -37%              -40%
largest_value     39361/s                58%            --               -6%
largest_value_mem 41810/s                68%            6%                --

100 keys:            Rate  largest_with_sort largest_value largest_value_mem
largest_with_sort  9894/s                 --          -50%              -56%
largest_value     19680/s                99%            --              -12%
largest_value_mem 22371/s               126%           14%                --

1,000 keys:         Rate   largest_with_sort largest_value largest_value_mem
largest_with_sort  668/s                  --          -69%              -71%
largest_value     2183/s                227%            --               -7%
largest_value_mem 2341/s                250%            7%                --

10,000 keys:        Rate   largest_with_sort largest_value largest_value_mem
largest_with_sort 46.5/s                  --          -79%              -81%
largest_value      216/s                365%            --              -11%
largest_value_mem  242/s                421%           12%                --

As you can see, if memory isn't much of an issue, the version with internal arrays is fastest, closely followed by the each iterator, and in a distant third... sort

Thorough answer. One comment though: the amortized complexity of a hash lookup is O(1), not O(log n). — jkasnicki, May 22 '10 at 15:10
comparing real world speeds of hash lookup to array lookup still shows a nonlinear relationship. with 10 elements, an array is %50 faster than a hash, with 10000 elements it is %100 faster, with 1,000,000 elements it is 210% faster... — Eric Strom, May 22 '10 at 17:16

score 11 · Answer 2 · answered May 23 '10 at 09:15

11

Not sure why everyone is doing this by hand...

use List::Util qw( reduce );
my $max_val_key = reduce { $hash{$a} > $hash{$b} ? $a : $b } keys %hash;

answered May 23 '10 at 09:15

Dave Sherohman

45,363
14
64
102

jkasnicki · Answer 3 · 2010-05-22T08:53:34.710

6

The following is more space-efficient and will run in O(n) instead of O(n log n) as compared to the other answers which sort the hash. It assumes values are integers greater than 0 and the hash is not empty, but should be easily extended for your case.

my $key_for_max_value;
my $max_value = -1;
while ((my $key, my $value) = each %hash) {
  if ($value > $max_value) {
    $max_value = $value;
    $max_key = $key;
  }
}

$key_for_max_value will now be the key corresponding to the highest value.

edited May 22 '10 at 08:53

answered May 22 '10 at 04:14

jkasnicki

5,726
1
15
9

4

There is an assumption in your code that the values of the hash aren't all negative numbers less than -1. You should just make $max_value the value of the first thing seen or something. – May 22 '10 at 04:18
4

Nice to know _someone_ out there still appreciates efficiency over short-handedness. Good explanation, too. – amphetamachine May 22 '10 at 04:42
@Kinopiko: And that can be done with something like `my $max_value = undef;` and later, change the `if` to `if (! defined $max_value || $value > $max_value)`. – Robert P May 22 '10 at 07:33
1

@amphetamachine for reasonably sized data sets, this solution is very likely to be slower than the one using `sort`. – hobbs May 22 '10 at 08:07
@hobb how exactly do you make O(n log n) go faster than O(n) ? – Alnitak May 22 '10 at 09:05
2

@Alnitak by having a smaller constant factor. Let f(n) = n * log(n) / log(10) and g(n) = n * 1000000. f(n) = O(n log n) and g(n) = O(n). Now let n = 10. f(10) is ten, and g(10) is ten million. Furthermore, f(n) will be less than g(n) as long as n is less than ten to the millionth power. This despite the fact that f(n) dominates g(n). – hobbs May 22 '10 at 10:31
1

(It should be noted that since log n is considered a fairly slow-growing function, O(n) and O(n log n) are therefore "not very different", meaning it doesn't take a very large constant factor advantage for an O(n) function to beat out an O(n log n) one at small n.) – hobbs May 22 '10 at 10:44
2

@hobbs I don't think this solution will ever be slower than one involving sorting. Your argument is valid in general (constant factors can make O(n log n) preferable for small n), but in this case the constant factor on the O(n) solution is small: we look at each element exactly once and do a very small amount of computation with it. Finally, the real win of this solution is the space-savings. Sorting will take O(n) space, while this solution takes O(1) space. See @Eric Strom answer for another discussion and performance numbers. – jkasnicki May 22 '10 at 15:01
@jkasnicki - well put. Sure, there _might_ be particular cases where `O(n log n)` is smaller than `O(n)` (for small values of `n`), but this isn't one of them! – Alnitak May 22 '10 at 17:04
1

@jkasnicki : Put a short-circuit operator to define `$max_value` on the first pass: `$max_value ||= $value;`. That way you can get rid of the `-1` assumption – Zaid May 23 '10 at 10:14

score 4 · Answer 4 · answered May 22 '10 at 03:57

4

The keys sorted by value, from lowest to highest:

sort { $hash{$a} <=> $hash{$b} } keys %hash

The keys sorted by value, from highest to lowest:

reverse sort { $hash{$a} <=> $hash{$b} } keys %hash

And the first element

(reverse sort { $hash{$a} <=> $hash{$b} } keys %hash)[0]

Replace the spaceship with cmp to taste.

answered May 22 '10 at 03:57

jrockway

42,082
9
61
86

Why not just use `values` instead of `keys`? – May 22 '10 at 04:02
Because he wants the key, not the value. The value is what to sort by, the key is what to return. Unless I am misreading the question. – jrockway May 22 '10 at 04:05
Ah, OK, sorry, I missed that. – May 22 '10 at 04:08
1

use `$hash{$b} <=> $hash{$a}` instead of `reverse` – knittl May 22 '10 at 15:21

score 3 · Answer 5 · answered May 23 '10 at 08:38

3

my ($max_key, $max_val) = each %hash or die "hash is empty";
while (my ($key, $val) = each %hash) {
  $max_key = $key, $max_val = $val if $val > $max_val;
}

answered May 23 '10 at 08:38

salva

9,943
4
29
57

amphetamachine · Answer 6 · 2016-11-02T14:13:58.493

1

my $highest_val = (keys {$hash{$b} <=> $hash{$a}} keys %hash)[0];

edited Nov 02 '16 at 14:13

answered May 22 '10 at 03:52

amphetamachine

27,620
12
60
72

That returns the key that's the highest value. I assume he wants the key that maps to the highest value. Otherwise, the question is too simple to be asking :) (And in that case, why not just "reverse sort keys %hash"?) – jrockway May 22 '10 at 03:58
2

It depends what you mean by "value" here. Usually a hash is thought of as key/value pairs, so I'd assume the same thing as jrockway. But it could also mean what amphetamachine said. The questioner should clarify. – May 22 '10 at 04:01
@jrockway - `And in that case, why not just "reverse sort keys %hash"?` - Because that's a lexical sort, and `sort {$b <=> $a}` hits two birds with one stone in that it's both a numeric sort AND it's reversed. – amphetamachine May 22 '10 at 04:28
1

but you're comparing the keys themselves, not the values they map to. – Vynce Oct 31 '16 at 21:59

David M · Answer 7 · 2010-05-22T04:08:59.083

1

my $highest_val = (sort { $hash{$a} <=> $hash{$b} } keys %hash)[0];

is likely to be what you want.

If you have a very large hash, you might want to use something like a Schwartzian transform:

my @array = map {[$hash{$_},$_]} keys %hash;
my $key_with_highest_value = (sort { $a->[0] <=> $b->[0] } @array)[0]->[1]

edited May 22 '10 at 04:08

answered May 22 '10 at 03:58

David M

4,325
2
28
40

This is more typing, but is O(n) instead of O(n log n), which is generally a good thing. If your list is big. – jrockway May 22 '10 at 04:06
1

The Schwartzian transform here only serves to reduce the number of hash table lookups, and does **not** change the complexity of the search - it's still O(n log n). The iterative approach from @jkasnicki is superior. – Alnitak May 22 '10 at 09:03

score 1 · Answer 8 · answered Apr 24 '18 at 10:26

1

If performance isn't an issue, I'd suggest a more literate programming solution.

use List::Util qw(max);
max keys %hash;

answered Apr 24 '18 at 10:26

Wolf

9,679
7
62
108

What is the easiest way to get a key with the highest value from a hash in Perl?

8 Answers8

Linked