1

Is the chance of hash-collisions growing, when you hash the object multiple times?

Meaning, is the chance of collisions higher for hash(hash(object)) than for hash(object)?

Dakkaron
  • 5,930
  • 2
  • 36
  • 51
F.M.F.
  • 1,929
  • 3
  • 23
  • 42

1 Answers1

2

Depends on what exactly you mean.

If the hash changes due to the rehashing, then yes, if it doesn't, then no.

If the object did not change and you rehash it, it will keep the same hash. So for example, the md5 hash of the string teststring will always be D67C5CBF5B01C9F91932E3B8DEF5E5F8.

But if the object changed and you rehash because of that, you will get a new hash.

Now, if you rehash an object that has changed, there might be a higher chance of collisions.

Say for example you have a very simple object containing only one integer value and a very simple hashing algorithm which just takes this value and does a modulo 20 on it. This is an intentionally bad hashing algorithm for this example only.

Now say you have two objects containing a random number. The chance of a hash collision for these two values is 1/20, because you have 20 buckets in the hashing algorithm.

If you now rehash, you once again have a chance of 1/20 chance for a collision, or a 19/20 chance for no collision.

So the chance for no collision after n rehashes is (19/20)^(n+1). So after the first rehash (so you have your original values and rehash one of the values once after it changed) you have a 90.25% chance that you don't have a collision. After the second rehash you are down to 85.76% chance that you don't have any collisions. After 100 rehashes you are down to only a 0.59% chance of no collisions.

That is all depending on that the values change to a new value before each rehash.

Another way to prove that is this:

  • Hashing algorithms give you a limited amount of buckets (=different possible hashes)
  • You can feed your hashing algorithm with an infinite amount of different values
  • Each value needs to be mapped to a bucket
  • If you have an infinite amount of values mapped to a finite amount of buckets, there will be collisions at some time.
Dakkaron
  • 5,930
  • 2
  • 36
  • 51
  • What do you mean by "change due to rehashing"? The outcome hash? – F.M.F. Oct 11 '17 at 09:26
  • Yes. If you rehash the same value the same way, you will always get the same hash. If you rehash the object after it changed, you will get a different hash, which will then change the hash. I'll add it to the answer. – Dakkaron Oct 11 '17 at 09:31
  • What I meant was: When I hash an object and then hash the hash (multiple times), does the chance for hash-collisions with other objects hashed the same way (mutiple times), getting higher? – F.M.F. Oct 11 '17 at 13:33