Questions tagged [hyperloglog]

Hyperloglog is an approximate technique for computing the number of distinct entries in a set.

Hyperloglog is an approximate technique for computing the number of distinct entries in a set implemented in Algebird, a scala library for abstract algebra. This can be used in Summingbird to create MapReduce programs for estimating cardinalities of large datasets in streaming (online) or batch (offline) mode. Data structure store Redis also has HyperLogLog implementation.

89 questions

vote

2 answers

What Algorithm is used by java.util.HashSet and java.util.TreeSet to store unique values in its structure?

I have come across multiple algorithms such as Flajolet-Martin algorithm , HyperLogLog to find out unique elements from a list of elements and suddenly became curious about how Java calculates it? And what is the Time-complexity in each of these…

asked Oct 22 '17 at 01:54

Phenomenal One

2,501
4
19
29

vote

1 answer

When should Redis HyperLogLog be avoided and why?

I have some basic ideas of how Redis HyperLogLog works and when to use it. Before using it I did a test: I pfadded some consecutive numbers to an HLL entry (to mimic user ids), and Redis soon gave a false positive result. To be exact, if you pfadd…

algorithm redis hyperloglog

asked Sep 04 '17 at 11:48

adamsmith

5,759
4
27
39

vote

1 answer

Merge uniq counters, probabilistic data structures

There are two sets 1 2 3 and 3 4 with 3 and 2 unique items. Now let's calculate unique items in merged set. If we just sum up the counters 3 + 2 = 5 it will be wrong (it should be uniq(1 2 3 3 4) = 4). Is there a way to do it using only the…

data-structures probability cardinality hyperloglog

asked Jan 26 '17 at 15:27

Alex Craft

13,598
11
69
133

vote

2 answers

How LogLog algorithm with single hash function works

I have found tens of explanation of the basic idea of LogLog algorithms, but they all lack details about how does hash function result splitting works? I mean using single hash function is not precise while using many function is too expensive. How…

database algorithm math data-structures hyperloglog

asked Oct 23 '16 at 11:49

VB_

45,112
42
145
293

vote

2 answers

Determine percentage of unused keys in large redis DB

I have a Redis database with many millions of keys in it. Over time, the keys that I have written to and read from have changed, and so there are many keys that I am simply not using any more. Most don't have any kind of TTL either. I want to get a…

database redis key ttl hyperloglog

asked Aug 26 '16 at 20:06

alec

vote

3 answers

Postgresql-hll (or another Hyperloglog data type/structure) for Redshift

Need to be able to report on Unique Visitors, but would like to avoid pre-computing every possible permutation of keys and creating multiple tables. As a simplistic example, let's say I need to report Monthly Uniques in a table that has the…

amazon-redshift hyperloglog

asked Aug 18 '16 at 16:23

Sologoub

5,312
6
37
65

vote

1 answer

Cardinality approximation for logical set operations – (The "HyperLogLog" for AND/OR/XOR)

we are currently facing an interesting problem. We would like to estimate the cardinality of a set without the need to store every single item (typically bitmaps/bitsets are a nice approach). A very nice algorithm is the so called HyperLogLog…

algorithm data-structures estimation hyperloglog

asked May 12 '16 at 22:29

Fritz

vote

1 answer

How to get a family of independent universal hash function?

I am trying to implement the hyperloglog counting algorithm using stochastic averaging. To do that, I need many independent universal hash functions to hash items in different substreams. I found that there are only a few hash function available in…

python hash hyperloglog

asked Apr 20 '16 at 07:57

Louis Kuang

vote

1 answer

How does one store unique "Likes" or "Views" or sets at scale?

I'd like to get some insight into how various companies solve counting/incrementing the number of "likes"/"views"/"retweets" or something similar at scale. At userbases past 50 million monthly active users, I've seen both Redis and Cassandra used to…

cassandra redis set hyperloglog

asked Apr 08 '16 at 19:23

nflacco

4,972
8
45
78

vote

1 answer

How do you test an implementation of Hyperloglog?

There are so many Hyperloglog implementation out there, but how do you verify / test Hyperloglog implementation? To check it's "accuracy", it's "error" bound behavior? Just throwing some static test cases looks very ineffective. More concrete,…

testing verification hyperloglog

asked Jan 08 '16 at 18:07

ETOMG

vote

1 answer

How to migrate hyperloglog key to azure redis

I am trying to migrate an redis hyperloglog key from one server to azure redis service using the MIGRATE command, but as far as i know MIGRATE doesn't support moving key to a redis server which requires authentication. How can i migrate hyperlolog…

azure redis azure-redis-cache hyperloglog

asked Dec 02 '15 at 15:50

Kobynet

vote

1 answer

Filtering huge quantities of data with combinations of logical expressions

I have huge quantities of data represented as (for example) - User ID | Gender | Location | Type of User There may be more columns depending on the use case. The location is denoted by a pincode. I recently read about HyperLogLog and the Redis…

search filter redis bigdata hyperloglog

asked Jul 24 '15 at 08:08

frugalcoder

votes

0 answers

Implementing HLL in python to estimate the cardinality

I'm trying to implement the HLL algorithm in python. I'm using data folders with the format of 13 bytes "\x00\x01\x02\x03\x05\x06\x07\x08\x09\x0a\x0b\x0c",described as follows: srcIP = "\x00\x01\x02\x03" srcPort = "\x04\x05" dstIP =…

python algorithm hash count hyperloglog

asked Jun 19 '23 at 08:29

Ella

votes

1 answer

Has a single HyperLogLog the same accuracy than merging several ones?

If I create a HyperLogLog per day to count unique visitors, and then the 1st of January I merge the last 365 ones will I get the same value than if I keep a single HyperLogLog for the whole 365 days? I guess not. But how different would those values…

redis hyperloglog

asked Dec 17 '22 at 19:34

vtscop

votes

1 answer

If HyperLogLog in Redis does not store the actual members but only count, how does PFMERGE work?

Does HyperLogLog store the actual members or only the count of members it is storing? If it is not storing the actual members, how does PFMERGE know which element to merge as count of 1 even when they are repeated across multiple HyperLogLog PFADD…

redis hyperloglog

asked Aug 15 '22 at 06:19

Ankit Sahay

1,710
8
14

Prev 1 2 3

5 6 Next