Can someone explain or link to an explanation about how counting the cardinality of a set with HLL can be used for time series analysis?
I'm pretty sure druid.io does exactly this, but I'm looking for a general explanation of how to do this with HLL alone, without any specific library / database or specific HLL implementation.
A Naive way of doing that would be by prefixing a timestamp on the things we are counting. E.g., using redis HLL API as an example, if you are counting events, starting from second 1000001 up to second 1000060:
PFADD SOMEHLLVAR "1000001-event1" "1000001-event2" ...
PFADD SOMEHLLVAR "1000002-event1" "1000002-event3" ...
PFADD SOMEHLLVAR "1000003-event2" "1000003-event3" ...
# Get count of occurrences of event1 in a minute long range:
PFCOUNT "1000001-event1" -> 1
PFCOUNT "1000002-event1" -> 1
PFCOUNT "10000..-event1" -> ..
PFCOUNT "1000060-event1" -> 0
...add all numbers! -> 2
Just one of the problems this would have is that you would need to iterate through each second in a given range to find out, say, the count of specific events in the last minute.