3

I was inspired by this unique id code to generate a random 64 bit identifier.

My question: will this be good enough for about 10 million entries?

def self.generateId
  (0..15).collect{(rand*16).to_i.to_s(16)}.join
end
Mark
  • 6,647
  • 1
  • 45
  • 88

3 Answers3

3

This is classic birthday problem.

With m=10^7 and n=10^20 (Since 2^64 ~ 10^20), and the collision probability is given by:

p = 1 - exp(-m^2/(2*n))

Gives a collision probability of 5e-07

I would say sampling without replacement is your best option.

Nishanth
  • 6,932
  • 5
  • 26
  • 38
  • Thanks. What do you mean by "sampling without replacement"? – Mark May 07 '13 at 13:31
  • imagine a set of numbered items in a bag & draw one at random. Since the item is drawn and not replaced, it will never repeat. Most `math` libraries have support for sampling without replacement. – Nishanth May 07 '13 at 14:19
  • I see. In my case that won't be an option, because ids are generated by many different clients that are not aware of each other. – Mark May 07 '13 at 17:31
  • @Mark how do you plan to seed your RNG on different clients? – Nishanth May 08 '13 at 02:56
  • @e4e5f5 No, I actually use `arc4random()` – Mark May 08 '13 at 06:51
0

I would make it 128 bit long, that way you don't have to worry for sure about 10M records

Ari
  • 159
  • 3
0

2^64 is about EDIT: 10^31 10^21, which is larger than 10^7 (10 million) by a factor of 10^14. So it is nearly completely safe to use only 64 bits.

Reinhard Männer
  • 14,022
  • 5
  • 54
  • 116
  • That's not right, sorry. `2^64 = (2^10)^6 * 2^4 = 1024^6 * 2^4 ~ 10^18 * 10 ~ 10^19` – Mike Sokolov Sep 05 '13 at 02:25
  • You are right, my fault. What I wanted to say is: 2^10 is about 10^3, so 2^64 is about 10^21, so 10^31 was a typo, it should have read 10^21. Anyway, 10^21 is larger than 10^7 by a factor of 10^14, any my argument is still valid. I have edited my answer accordingly. – Reinhard Männer Sep 05 '13 at 18:38