0

If there are k number of leading zeros in the bit pattern of hash, why is the estimate size considered to be 2k+1? shouldn't it be 2k ? the probability of having k leading zero should be 1/(2k) and hence the size should be 2k

In my code I always get correct estimation of size when I use k+1 instead of k. But I fail to understand the logic behind this.

Golak Sarangi
  • 809
  • 7
  • 22

3 Answers3

2

The intuition you're looking for is that the algorithm relies on the probability of seeing the entire bit pattern at the beginning of the hash (k zeros, followed by a 1), not just the zeros.

The more difficult part is getting from there to estimating the cardinality at 2k+1. Unfortunately the formal proof of this isn't straightforward. In fact, most of the original original paper which introduced the method (Flajolet and Martin, Probabilistic counting Algorithms for Data Base Applications, http://algo.inria.fr/flajolet/Publications/FlMa85.pdf) is devoted to proving that the estimate computed with it is a good one. Subsequent papers (the LogLog and HyperLogLog papers) have similar proofs for their improved estimates.

Hope that helps!

OronNavon
  • 1,293
  • 8
  • 18
1

k leading zeros mean that the first k bits are zeros that are followed by a one bit. (Otherwise, we would have more than k leading zero bits.) Therefore, k leading zeros are actually characterized by a bit sequence of length (k+1), for which the probability is 1/2^(k+1).

otmar
  • 386
  • 1
  • 9
0

According to probability theory you are correct! You would expect to have made 2k observations (on average) before having observed a value with k leading zeros.

The reason your estimate is double what it should be might be because your random function (or hashing function) is returning a signed int that is always positive and a leading zero is always present. This should approximately double your chances at seeing a value with k leading zeros. That is why you would get the correct answer when you use 2k+1 instead of 2k.

Snives
  • 1,226
  • 11
  • 21