1

Assume a hash function that produces digests of 160 bits. How many messages do we need to hash to get a collision with approximately 75% probability?

Thank you for you help :)

Bernd Eber
  • 19
  • 2

2 Answers2

1

The rule of thumb is that there's a 50% chance of a collision after sqrt(n) numbers are drawn. The number is slightly more than that, but the square root is a good guideline. So in your case you have a 50% chance of collision after 2^80 tries.

The other rule of thumb is that after 4*sqrt(n), your probability of getting a duplicate is nearly a certainty.

According to https://en.wikipedia.org/wiki/Birthday_problem#Probability_of_a_shared_birthday_(collision), you can compute the number, n of values you need to draw to get a probability p of a duplicate by:

n = sqrt(2 * d * ln(1/(1-p)))

Where ln is the natural logarithm, and p is the probability from 0 to 1.0.

So in your case:

n = sqrt(2 * 2^160 * ln(1/.25))
n = sqrt(2^161 * 1.38629)

Which is something less than 2^81.

Slothario
  • 2,830
  • 3
  • 31
  • 47
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
0

Somewhere in the range of 2 septillion. That's 2,000,000,000,000,000,000,000,000 messages. Here's the equation.

chance of collision = 1 - e^(-n^2 / (2 * d))

Where n is the number of messages, d is the number of possibilities. So if d is 2^160, then n is going to be in the neighbourhood of 2^80.7.

mypetlion
  • 2,415
  • 5
  • 18
  • 22
  • 1
    I.e. about half of the hash output size size due to the birthday "paradox". Note that the memory requirements to store the hashes to compare them with each other will be in the same order, and as far as I know there is not enough memory *in the world* to do such a thing. These hashes have been created to avoid collisions. SHA-1 however fails at this, google "sha-1 shattered". – Maarten Bodewes Apr 27 '18 at 22:42
  • @MaartenBodewes Not half the output size, but rather the square root of the output size. – Jim Mischel May 01 '18 at 15:49