Your question is how to get the five highest values from your hash. You have this code:
my @keys = sort {
$words{$b} <=> $words{$a}
or
"\L$a" cmp "\L$b"
} keys %words;
Where you have your sorted hash keys. Take the five top keys from there?
my @highest = splice @keys, 0, 5; # also deletes the keys from the array
my @highest = @keys[0..4]; # non-destructive solution
Also some comments on your code:
open( my $filehandle0, '<', $file0 ) || die "Could not open $file0\n";
It is a good idea to include the error message $!
in your die statement to get valuable information for why the open failed.
for (@words) {
s/[\,|\.|\!|\?|\:|\;|\"]//g;
}
Like I said in the comment, you do not need to escape characters or use alternations in a character class bracket. Use either:
s/[,.!?:;"]//g for @words; #or
tr/,.!?:;"//d for @words;
This next part is a bit odd.
my @stopwords;
while ( my $line = <$filehandle1> ) {
chomp $line;
my @linearray = split( " ", $line );
push( @stopwords, @linearray );
}
for my $w ( my @stopwords ) {
s/\b\Q$w\E\B//ig;
}
You read in the stopwords from a file... and then you delete the stopwords from $_
? Are you even using $_
at this point? Moreover, you are redeclaring the @stopwords
array in the loop header, which will effectively mean your new array will be empty, and your loop will never run. This error is silent, it seems, so you might never notice.
my %words = %words_count;
Here you make a copy of %words_count
, which seems to be redundant, since you never use it again. If you have a big hash, this can decrease performance.
my $key_count = 0;
$key_count = keys %words;
This can be done in one line: my $key_count = keys %words
. More readable, in my opinion.
$value_count = $words{$key} + $value_count;
Can also be abbreviated with the +=
operator: $value_cont += $words{$key}
It is very good that you use strict and warnings.