Redis Hyperloglog limitations

Question

I am trying to solve a problem in a hacky way using Redis Hyperloglog but what I am trying to understand is the limitations and assumptions by Hyperloglog on the data or the distribution.

The count-min and bloom filter have their own set of limitations but google isn't being helpful in providing much info on applications and limitations of Hyperloglog.

I am using Redis Hyperloglog and as Antirez describes there are no practical limits to the cardinality of the sets we can count. But from a theory perspective, does Hyperloglog make any assumptions/constraints about the data or the distribution?

otmar · Answer 1 · 2016-06-17T20:48:39.223

The HyperLogLog algorithm assumes that a strong universal hash function is used. Redis uses MurmurHash64A which should be good enough from a practical point of view. Redis HyperLogLog implementation uses 6 bits per registers which allows to represent any bit run-lengths within 64bit hash values. Hence, the only limitation I see is the 64bit hash value itself. If the cardinality is in the order of 2^64, there will be many hash collisions that finally would lead to large estimation errors. However, cardinalities of this order of magnitude never occur in practice.

Redis Hyperloglog limitations

1 Answers1