0

I have read that randomness and uniform distribution are quite important for a hash function. How do I make a comparison between the randomness property of two different hash functions?

Kishan Kishore
  • 451
  • 6
  • 12

1 Answers1

1

Take two test strings that only differ very slightly, ideally by just one bit: BBBBBB, BBBBBC. Take the hash of each string with a hash function, and see how many bits of the output are changed by a one bit change in the input. An ideally random hash function should switch half the bits in the second output: changing one bit in the input changes half the bits in the output. Cryptographic hash function try to approach this ideal, while other hash function go some way towards it, but sacrifice ideal behaviour for speed.

Repeat for many pairs of almost identical strings to get an average measure of how random the first hash function is. Repeat for the second hash function. The one which gets closest to 50% of the bits changed on average is probably the more random hash function.

This test does not look at other criteria like speed.

rossum
  • 15,344
  • 1
  • 24
  • 38
  • Doesn't that test the avalanche criterion for the hash functions? I wanted to know about other ways like measuring the entropy of the output bit stream or the chi-squared test. – Kishan Kishore Mar 12 '16 at 14:56
  • Yes it does. For the other ways you will have to ask someone else, or do your own internet research. The avalanche effect is easy to test. – rossum Mar 12 '16 at 15:07
  • I am on it. But what do i measure for all the input pairs,is it the hamming distance between the hashes of the two strings? And then maybe calculate the average hamming distance. – Kishan Kishore Mar 12 '16 at 15:32
  • And also do BBBBBB and BBBBBC differ only by one bit? – Kishan Kishore Mar 12 '16 at 18:57
  • Yes. B = 0x42 = 0b01000010; C = 0x43 = 0b01000011. The last bit is different. – rossum Mar 12 '16 at 19:20
  • And am i correct about the avalanche effect measurement? – Kishan Kishore Mar 12 '16 at 20:12
  • Yes. My answer discussed the avalanche effect. – rossum Mar 12 '16 at 22:02
  • What if a hash doesn't use the entirety of a fixed-length? djb2 or sdbm for example – bryc Mar 07 '17 at 05:28
  • @bryc Same criterion, 50% of the output bits change for a one bit input change. It is just that there are not as many bits so the collision probability will be higher. – rossum Mar 07 '17 at 08:44