2

I am trying to solve a problem in a hacky way using Redis Hyperloglog but what I am trying to understand is the limitations and assumptions by Hyperloglog on the data or the distribution.

The count-min and bloom filter have their own set of limitations but google isn't being helpful in providing much info on applications and limitations of Hyperloglog.

I am using Redis Hyperloglog and as Antirez describes there are no practical limits to the cardinality of the sets we can count. But from a theory perspective, does Hyperloglog make any assumptions/constraints about the data or the distribution?

Chenna V
  • 10,185
  • 11
  • 77
  • 104

1 Answers1

2

The HyperLogLog algorithm assumes that a strong universal hash function is used. Redis uses MurmurHash64A which should be good enough from a practical point of view. Redis HyperLogLog implementation uses 6 bits per registers which allows to represent any bit run-lengths within 64bit hash values. Hence, the only limitation I see is the 64bit hash value itself. If the cardinality is in the order of 2^64, there will be many hash collisions that finally would lead to large estimation errors. However, cardinalities of this order of magnitude never occur in practice.

otmar
  • 386
  • 1
  • 9