2

I was studying data structures supported by Redis and I was not able to find out an explanation that could make me understand what HyperLogLog is.

How do I use it and why is this good for?

Vikto
  • 512
  • 1
  • 7
  • 19

2 Answers2

8

Basically is a kind of Redis Set which uses optimized algorithms to count elements by avoiding a great consumption of memory. The difference between a Set and a HyperLogLog is that with a HyperLogLog you just can add, count unique element and merge some HyperLogLogs in another one, so basically you don't store the members in a HyperLogLog as you could do in a SET, and retrieve them, you just store the occurrences of different members, that is the reason which HyperLogLog doesn't provide a command to retrieve its stored members.

A clear uses case could be if you want to have a huge SET where you want to count so many times the number of unique data inside the set, you are not interested in which data are inside the set, you are only interested in consuming low memory even when the set grows a lot. For instance, imagine you have a high impact system with a large number of users all of them very active, and you are interested in knowing the number of unique visitors in every webpage of your system. You want to be updated real-time, so you will query every second the unique visitors for every website. You could create a HyperLogLog for every URI in your system, which will represent the webpage and every time a user visits a URL you will PFAAD the user_id:

PFAAD /api/show/concerts id789989

then every second you will iterate for every URL-HyperLogLog to get number of unique user-visitors

PFCOUNT /api/show/concerts

145542

PFCOUNT /api/show/open-airs

25565223

And you would say, yes but I can get the same functionality by using SET with the benefit of having the user_ids in every set as members. Yes, you can, but you will consume much memory by using sets and every time (second) you query every set to get the numbers of unique visitors with SCARD command, you will spend even much memory, so at least you need to store user_ids for some reason, HyperLogLogs are better options as counters of unique elements. For our use case, imagine having 200-300 sets with around 20-30k of users inside.

The correspondence between HyperLogLog and Set commands:

  • PFADD = SADD
  • PFCOUNT = SCARD
  • PFMERGE = SUNION
Averias
  • 931
  • 1
  • 11
  • 20
  • This is exactly what I was looking for. Thank you, man. I am starting to understand what Hyperloglog is :) – Vikto Mar 14 '18 at 02:08
1
  • I do not think that it is considered to be a data type. It is an algorithm but in Redis it is considered as type

  • It is a very complex algorithm that looks at the string, does some parsing on it, does some very complex math, and kind of remembers that string but it does not actually store it

  • It has nothing to do with the logging (I thought it was). it is used whenever we want to keep track of the uniqueness of a collection of different elements and specifically the approximate uniqueness.

  • similar to a set but does not store the elements

  • it runs in O(1), constant time, and uses a very small amount of memory—up to 12 kB of memory per key.

  • The HyperLogLog algorithm is probabilistic, which means that it does not ensure 100 percent accuracy because the hyperloglog does not actually truly store these individual items. The Redis implementation of the HyperLogLog has a standard error of 0.81 percent. It means that if you see 1000 views, the real count might be between 991-1008. it is ok having this error for counting the number of views but if you need to keep track of unique usernames or emails, you should be storing them in sets.

Here are a few examples of where HyperLogLogs can be used:

• Counting the number of unique users who visited a website

• Counting the number of distinct terms that were searched for on your website on a specific date or time

• Counting the number of distinct hashtags that were used by a user

• Counting the number of distinct words that appear in a book

Yilmaz
  • 35,338
  • 10
  • 157
  • 202