8

When Perl 5.8.1 came out it added hash randomization. When Perl 5.8.2 came out, I thought, it removed hash randomization unless an environment variable (PERL_HASH_SEED) was present. It now seems as if I am gravely mistaken as

PERL_HASH_SEED=$SEED perl -MData::Dumper -e 'print Dumper{map{$_,1}"a".."z"}'

Always kicks back the same key ordering regardless of the value of $SEED.

Did hash randomization go completely away, am I doing something wrong, or is this a bug?

Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
  • See also: [Why do hash keys have different order when printing?](http://stackoverflow.com/q/30340027/2019415) – G. Cito Nov 10 '15 at 17:25

2 Answers2

6

See Algorithmic Complexity Attacks:

In Perl 5.8.1 the hash function is randomly perturbed by a pseudorandom seed which makes generating such naughty hash keys harder. [...] but as of 5.8.2 it is only used on individual hashes if the internals detect the insertion of pathological data.

So randomization doesn't always happen, only when perl detects that it's needed.

Mat
  • 202,337
  • 40
  • 393
  • 406
  • 3
    from `perlrun` as well: `Most hashes by default return elements in the same order as in Perl 5.8.0. On a hash by hash basis, if pathological data is detected during a hash key insertion, then that hash will switch to an alternative random hash seed.` – Eric Strom Jul 13 '11 at 20:26
  • Drat, I had a nifty trick I wanted to use hash randomization for. – Chas. Owens Jul 13 '11 at 20:26
  • 1
    Reading perlrun, it seems to me it is saying if the env var is set it should be used for all non-pathological hashes, but that doesn't seem to be the behavior. – ysth Jul 13 '11 at 20:38
  • 3
    @Chas. Owens, It's NEVER reliably random, even when the seed takes effect. Use [List::Util](http://search.cpan.org/perldoc?List::Util)'s `shuffle` – ikegami Jul 13 '11 at 21:33
  • @ikegami Shuffle requires a full list to work, the trick was to randomize a hashes values without using more than two extra scalars. It would work on Perl 5.8.1, but only on Perl 5.8.1. – Chas. Owens Jul 14 '11 at 00:59
  • 1
    @Chas. Owens, No, no version of Perl make key orderings random. At best bucket selection could be random. Those two things are not equivalent. – ikegami Jul 14 '11 at 01:14
  • @ikegami That depends on your definition of "random". In Perl 5.8.1, "the order varies between different runs of Perl". This would allow one random shuffle of a hash's values where random is defined as an order that is not predictable. – Chas. Owens Jul 14 '11 at 01:15
  • 1
    @Chas. Owens, varies != unpredictable. – ikegami Jul 14 '11 at 01:19
  • @ikegami It is initialized by a pseudo-random number. That is random enough for me. The whole point of the 5.8.1 change was to make the ordering of keys unpredictable to an outside attacker (because if the attacker knew which bucket keys would be stored in he or she could create pathological data that could be used in a DOS attack). In 5.8.2 and later, the code was changed to only use the pseudo-random number after it detected a pathological case. – Chas. Owens Jul 14 '11 at 01:34
  • 1
    @Chas. Owens. No, key ordering does not use any pseudo-random number. – ikegami Jul 14 '11 at 02:26
  • @ikegami Run the command `PERL_HASH_SEED_DEBUG=1 perl -e 0` with a version of `perl` greater than 5.8.0. See that number? That is a pseudo-random number that is used to make the bucket a key gets hashed to unpredictable when pathological data is detected. In Perl 5.8.1, it was always used. The bucket is consistent, but unpredictable. This means that, in Perl 5.8.1, the first time you insert keys into a hash and then examine the order of those keys, they keys will be in random order. After that point the keys will always be in that order, so you can't do it a second time in a single run. – Chas. Owens Jul 14 '11 at 02:49
  • 1
    @Chas. Owens, A pseudo-random number is used in a manner that affects bucket selection. Again, you're confusing bucket selection and key ordering. – ikegami Jul 14 '11 at 02:54
  • @ikegami If you were talking about some abstract implementation, I might agree with you, but in Perl's implementation bucket selection is the primary factor in key order. If you are arguing that collisions reduce the amount of entropy, well, yes they do. Items added to the hash last are less likely to be first than items added at the beginning and vice-versa, but it important to remember that even with a million pseudo-random strings (a-z, 3 to 19 characters long) the most items in a bucket is around 8 or 9 and 99.75% of them hold 5 or less. That is random enough for most purposes. – Chas. Owens Jul 14 '11 at 03:27
  • @Chas. Owens, No, that's not true. Bucket selection is *NOT* the primary factor in key order. It's one of two equally important factors (the other is position within the bucket). – ikegami Jul 14 '11 at 16:57
  • @ikegami Now you are arguing the meaning of primary. The primary factor is bucket, the secondary factor is position in the bucket. Take a look at the numbers I quoted again. 99.75% of the keys in a million key hash are in buckets with five or fewer keys. A large portion (over 30% I think, but I don't have the program in front of me) of the keys are alone in their buckets. If you don't think the primary factor is which bucket a key is placed in, well, I don't know what I can say to convince you. Is key order as random as the PRNG, no. Is it random enough for most purposes, I would say yes. – Chas. Owens Jul 14 '11 at 19:56
  • @Chas. Owens, No, I'm not. Ignore the first sentence if you want. The point is that it's one of two factors control the order of every element, and only one is possibly random (and it hasn't been demonstrated that it ever is), so the order can't possibly be random. The odds of guessing the relative order of two elements can be far greater than 50%. – ikegami Jul 14 '11 at 22:04
  • It is demonstrable with Perl 5.8.1. Or you can read the perldelta linked in the question. Hash randomization (their words, not mine) was a feature added in Perl 5.8.1 and then modified in Perl 5.8.2 (so it only randomizes if it detects pathological data). How random does something have to be for you to consider it random? You do know that PRNGs aren't random right? We don't need cryptographically secure randomness, just reasonable unpredictability. Try downloading Perl 5.8.1 and trying it, I think you will find it does a pretty good job of being unpredictably. – Chas. Owens Jul 14 '11 at 23:25
1

At a minimum there have been some sloppy documentation updates. In the third paragraph of perlrun's entry for PERL_HASH_SEED it says:

The default behaviour is to randomise unless the PERL_HASH_SEED is set.

which was true only in 5.8.1 and contradicts the paragraph immediately preceding it:

Most hashes by default return elements in the same order as in Perl 5.8.0. On a hash by hash basis, if pathological data is detected during a hash key insertion, then that hash will switch to an alternative random hash seed.

perlsec's entry for Algorithmic Complexity Attacks gets this right:

In Perl 5.8.1 the random perturbation was done by default, but as of 5.8.2 it is only used on individual hashes if the internals detect the insertion of pathological data.

perlsec goes on to say

If one wants for some reason emulate the old behaviour [...] set the environment variable PERL_HASH_SEED to zero to disable the protection (or any other integer to force a known perturbation, rather than random).

[emphasis added]

Since setting PERL_HASH_SEED does not effect the hash order, I'd call it a bug. Searching for "PERL_HASH_SEED" on rt.perl.org didn't return any results, so it doesn't appear to be a "known" issue.

Wolf
  • 9,679
  • 7
  • 62
  • 108
Michael Carman
  • 30,628
  • 10
  • 74
  • 122
  • It doesn't contradicts the paragraph immediately preceding it, since the preceding paragraphs explains the randomising behaviour – ikegami Jul 13 '11 at 21:36
  • The fact that PERL_HASH_SEED does not always affect hash order is not a bug since that's not what it's documented to do. You as quoted and emphasised, it's documented to affect the randomisation, and does exactly that. – ikegami Jul 13 '11 at 21:40
  • 2
    @ikegami It can certainly be made more clear. Reading just `perlrun` made me think setting `PERL_HASH_SEED` would cause it to randomize, not enable randomizing if a pathological case was detected. – Chas. Owens Jul 14 '11 at 01:03
  • 2
    @ikegami: Both paragraphs claim to describe default behavior. One says that the hash order is randomized, the other says "the same order as in Perl 5.8.0" which is *not* random. How is that not a contradiction? – Michael Carman Jul 14 '11 at 14:30
  • @ikegami: While I can see how you interpret perlsec's description that way (the same thing occurred to me) it feels like a rationalization. If the seed is only used when a need for randomization is detected it should say so explicitly. Furthermore, it would seem more useful to be able to set the seed (always) than to set a known seed in the face of an algorithmic attack. I could have sworn it worked that way in 5.8.1 (i.e. set `PERL_HASH_SEED` to get the same order for different runs of a program) but I don't have that version installed and perldoc.perl.org doesn't go back that far. – Michael Carman Jul 14 '11 at 14:51
  • @Michael I don't know. It is what I expected, but I can see a reason for how it works today. If setting `PERL_HASH_SEED` always triggered the randomizing code, then you could not reproduce the case where pathological data came in and then the hash was reordered (with a specific seed). From a testing point of view, the current behavior makes it possible to reproduce (assuming you have `PERL_HASH_SEED_DEBUG` always turned on so you know what seed is used for a given test) a test that is failing. – Chas. Owens Jul 14 '11 at 20:20