1

I have a hash that I sorted by values greatest to least. How would I go about getting the top 5? There was a post on here that talked about getting only one value.

What is the easiest way to get a key with the highest value from a hash in Perl?

I understand that so would lets say getting those values add them to an array and delete the element in the hash and then do the process again?

Seems like there should be an easier way to do this then that though.

My hash is called %words.

Edited Took out code as the question answered without really needing it.

Community
  • 1
  • 1
Kirs Kringle
  • 849
  • 2
  • 11
  • 26
  • 3
    `"I have a hash that I sorted.." No. You cannot sort a hash. – TLP Dec 04 '12 at 04:28
  • No you cannot, I mean I printed them in order, but I want to take the top 5 with the highest value. – Kirs Kringle Dec 04 '12 at 04:35
  • 1
    `s/[\,|\.|\!|\?|\:|\;|\"]//g` You should not use alternations inside character class brackets, and you don't need to escape characters (except `-` and `]`): `s/[,.!?:;"]//g for @words`. Or use `tr/,.!?:;"//d`. – TLP Dec 04 '12 at 05:02
  • 3
    Uhm... you have already sorted the keys into the array `@keys`. You only need to take the first five elements from that array. – TLP Dec 04 '12 at 05:09

3 Answers3

5

Your question is how to get the five highest values from your hash. You have this code:

my @keys = sort {
    $words{$b} <=> $words{$a}
    or
    "\L$a" cmp "\L$b"
} keys %words;

Where you have your sorted hash keys. Take the five top keys from there?

my @highest = splice @keys, 0, 5;  # also deletes the keys from the array
my @highest = @keys[0..4];         # non-destructive solution

Also some comments on your code:

open( my $filehandle0, '<', $file0 ) || die "Could not open $file0\n";

It is a good idea to include the error message $! in your die statement to get valuable information for why the open failed.

for (@words) {
    s/[\,|\.|\!|\?|\:|\;|\"]//g;
}

Like I said in the comment, you do not need to escape characters or use alternations in a character class bracket. Use either:

s/[,.!?:;"]//g for @words;   #or
tr/,.!?:;"//d  for @words;

This next part is a bit odd.

my @stopwords;
while ( my $line = <$filehandle1> ) {
    chomp $line;
    my @linearray = split( " ", $line );
    push( @stopwords, @linearray );
}
for my $w ( my @stopwords ) {
    s/\b\Q$w\E\B//ig;
}

You read in the stopwords from a file... and then you delete the stopwords from $_? Are you even using $_ at this point? Moreover, you are redeclaring the @stopwords array in the loop header, which will effectively mean your new array will be empty, and your loop will never run. This error is silent, it seems, so you might never notice.

my %words = %words_count;

Here you make a copy of %words_count, which seems to be redundant, since you never use it again. If you have a big hash, this can decrease performance.

my $key_count = 0;
$key_count = keys %words;

This can be done in one line: my $key_count = keys %words. More readable, in my opinion.

$value_count = $words{$key} + $value_count;

Can also be abbreviated with the += operator: $value_cont += $words{$key}

It is very good that you use strict and warnings.

TLP
  • 66,756
  • 10
  • 92
  • 149
  • you have been extremely helpful. You answered more then I was asking. Thank you for pointing out my flaws. I have used a lot of stuff I've read or seen online. I then adapt it to my code. I have really learned a lot in doing this project. Thank you for the help. Cheers. – Kirs Kringle Dec 05 '12 at 06:12
  • Although I'm not sure what is bad about my stopwords. – Kirs Kringle Dec 05 '12 at 06:16
  • A substitution is by default applied to `$_`, so `s/foo//` means `$_ =~ s/foo//`. In your loop, I don't know which `$_` you refer to. If meant for it to apply to `$w`, then you are just deleting everything from the `@stopwords` array. – TLP Dec 05 '12 at 06:20
  • my intentions for that was to make sure none of the stopwords are already declared operators in perl. I read somewhere that it was a good practice to get into. However, maybe I applied it in the wrong way. – Kirs Kringle Dec 05 '12 at 06:27
  • 1
    I assume you mean meta characters. But that doesn't matter unless you use them in a regex. Any characters used in a hash key are literal, so no worries there. – TLP Dec 05 '12 at 06:30
3

If performance isn't a big deal

(sort {$words{$a} <=> $words{$b}} keys %words)[0..4])

if you absolutely need killer speed, a selection sort which terminates after 5 iterations is probably the best thing for you.

my @results;
for (0..4) {
  my $maxkey;
  my $max = 0;

  for my $key (keys %words){
    if ($max < $words{$key}){
      $maxkey = $key;
      $max = $words{$key};
     }
  }
  push @results, $maxkey;
  delete $words{$maxkey};
}

say join(","=>@results);
daniel gratzer
  • 52,833
  • 11
  • 94
  • 134
  • Speed isn't really an issue now, but I will look into the speed way because what I'm doing on a large scale would need speed. Thank you for your help. – Kirs Kringle Dec 04 '12 at 04:24
1

There's CPAN module for that, Sort::Key::Top. It has a straight-forward interface and an efficient XS implementation:

use Sort::Key::Top qw(rnkeytop);

my @results = rnkeytop { $words{$_} } 5 => keys %words;
creaktive
  • 5,193
  • 2
  • 18
  • 32