5

Below is the small script in Perl. Every time I run this code I'm getting different output.

Can anyone help me to understand the basics of storage of hash variables, that is how indexing is done for the key value pairs of Perl's hash variable.

#!/usr/bin/perl

%data = ('John Paul' => 45, 'Lisa' => 30, 'Kumar' => 40);
@names = keys %data;
print "$names[0]\n";
print "$names[1]\n";
print "$names[2]\n";
ikegami
  • 367,544
  • 15
  • 269
  • 518
Ataul Haque
  • 93
  • 2
  • 9
  • 2
    I think this is an interesting question because of the subtle implication of *what* this means Perl is doing - and why. However, in general, hash order is "not reliable" so while it may expose an interesting artifact, it is doing so under the general "don't do that" category - which is likely the cause of downvote(s). – user2864740 Feb 18 '15 at 19:44
  • Reproducible https://ideone.com/IXwgWz (more names added so the different order is more visible more often) – user2864740 Feb 18 '15 at 19:49
  • 1
    possible duplicate of [What decides the order of keys when I print a Perl hash?](http://stackoverflow.com/questions/2011342/what-decides-the-order-of-keys-when-i-print-a-perl-hash) – hmatt1 Feb 18 '15 at 19:53
  • I do not believe it is a duplicate of that question (of which the answers do cover the "don't do that" category), for reasons previously stated. This is a *specific* situation about *why the order differs between runs* and is tied to a specific implementation detail. This behavior is "relatively new". – user2864740 Feb 18 '15 at 20:14
  • 6
    @user2864740, It's actually quite old. It's at least 12 years old since env var `PERL_HASH_SEED` was added in 5.8.1. The only difference is that it now happens more often. It used to be some specific conditions had to happen before a hash would become salted, but now all hashes are salted from the start. – ikegami Feb 18 '15 at 20:23
  • @ikegami "relatively new" ;-) given the static [although not necessarily bad] changes since Perl 6 was .. uhh, announced. And yes, this dates me quite a bit.. in any case, a bit of new trivia filed. – user2864740 Feb 18 '15 at 20:29

2 Answers2

12

The behaviour is documented in perlsec's Algorithmic Complexity Attacks.


A hash is an array of linked lists. A hashing function converts the key into a number which is used as the index of the array element ("bucket") into which to store the value. More than one key can hash to the same index ("collision"), a situation handled by the linked lists.

If a malicious user knew the hashing algorithm, he could devise values that would hash to the same index, causing the hash to degenerate into a linked list. This can lead to huge performance drops in some applications, and thus can be used as part of a denial of service (DoS) attack.

Two measures are taken to avoid that. One is to salt the hashing algorithm to randomize the order in which elements are stored, and the other makes it harder to detect the salt by perturbing the order in which the iterator visits the hash elements.

$ perl -E'
   my @k = "a".."z";
   for (1..3) {
      my %h = map { $_ => 1 } @k;
      say keys %h;
   }
'
iocmbygdkranwxfejuqpzvltsh
bmcoigdywrankujfxezpqlvths
juexfwarnkgdybmcoihstlvzpq
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1
    (The vulnerability mitigated is known as a "Hash [Flood] DoS".) – user2864740 Feb 18 '15 at 20:19
  • 1
    Plus ff you *need* order *and* you need a "hash" there's various ways of doing that among which: [`Hash::Ordered`](https://metacpan.org/pod/Hash::Ordered). – G. Cito Feb 19 '15 at 02:46
  • @ikegami: What perl version you are using? In my system, perl v5.14.2, the ouput is one time `wraxdjyukhgftienvmslcpqbzo` and `wraxdjyukhgftienvmslpcqbzo` two times (or more if I increase the for loop range). Why this happen? – cuonglm Feb 20 '15 at 20:28
  • @cuonglm, You can get different orderings in 5.14 between runs, but some specific conditions have to happen first. The frequency at which you get different orderings changed in 5.18. – ikegami Feb 20 '15 at 20:36
  • @ikegami: Can you clarify what are `specific conditions`? – cuonglm Feb 20 '15 at 20:40
  • @cuonglm, When a hash becomes degenerate (starts looking too much like a linked list), the salt of the hash's hash algorithm is changed. – ikegami Feb 20 '15 at 20:41
  • @ikegami: Ah, of course, I know that. I only mean how I can get the different output for each time like yours. I change to perl 5.20 and get the expected output. And after reading the document again, this semantic is with perl 5.18 and above. Older version need setting `PERL_HASH_SEED` and `PERL_PERTURB_KEYS`. – cuonglm Feb 20 '15 at 20:53
  • @cuonglm, No, old version didn't need to set `PERL_HASH_SEED` to enable the salting (it is usually used to *disable* the salting when debugging hashes), and didn't have `PERL_PERTURB_KEYS`. – ikegami Feb 20 '15 at 21:07
3

This behavior is described in perldoc -f keys

Hash entries are returned in an apparently random order. The actual random order is specific to a given hash; the exact same series of operations on two hashes may result in a different order for each hash. Any insertion into the hash may change the order, as will any deletion, with the exception that the most recent key returned by each or keys may be deleted without changing the order. So long as a given hash is unmodified you may rely on keys, values and each to repeatedly return the same order as each other.

.. in order to prevent Algorithmic Complexity Attacks

mpapec
  • 50,217
  • 8
  • 67
  • 127
  • 1
    The behavior implies that the (re)hash function used by Perl is itself non-deterministic across processes (which is not in conflict with, but not fully covered by, the quote) and should probably be covered in greater detail. – user2864740 Feb 18 '15 at 19:41
  • 1
    That passage does not explain why it differs bewteen runs. – ikegami Feb 18 '15 at 20:01
  • 2
    @Сухой27, Rather than replacing everything, I have posted my own answer. – ikegami Feb 18 '15 at 20:14
  • Correct documentation link: [Algorithmic Complexity Attacks](http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-Attacks) – ikegami Feb 18 '15 at 20:34
  • @ikegami tnx for the link. – mpapec Feb 18 '15 at 20:37
  • 1
    Re "in order to", That quoted passage is true even when no measures are taken to prevent the "algorithmic complexity attacks". – ikegami Feb 18 '15 at 20:40