Questions tagged [hyperloglog]

Hyperloglog is an approximate technique for computing the number of distinct entries in a set.

Hyperloglog is an approximate technique for computing the number of distinct entries in a set implemented in Algebird, a scala library for abstract algebra. This can be used in Summingbird to create MapReduce programs for estimating cardinalities of large datasets in streaming (online) or batch (offline) mode. Data structure store Redis also has HyperLogLog implementation.

89 questions

votes

1 answer

Speeding up my implementation of HyperLogLog algorithm

I made my own implementation of HyperLogLog algorithm. It works well, but sometimes I have to fetch a lot (around 10k-100k) of HLL structures and merge them. I store each of them as a bit string so first I have to convert each bit string to buckets.…

asked May 15 '14 at 19:03

skaurus

1,581
17
27

votes

1 answer

Redis Hyperloglog - PFCOUNT side effect

Redis recently released their new data structure called the HyperLogLog. It allows us to keep a count of unique objects and only takes up a size of 12k bytes. What I don't understand is that Redis's PFCOUNT command is said to be technically a write…

data-structures redis hyperloglog

asked Apr 19 '14 at 00:31

JHAWN

votes

2 answers

Count unique users in last 60 mins per page with Redis HyperLogLog

I’m designing an algorithm to count unique users on a set of pages, based on a 60min sliding scale So it needs to find unique IPs (or tokens) that have hit a particular page and total up those hits within the last 60 mins I need this to be very fast…

algorithm redis hyperloglog

asked Jul 05 '20 at 10:26

Ben

votes

1 answer

HLL+ Precision for Google BigQuery

The precision of using HLL.INIT(...) and HLL.MERGE(...) is described here: https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions However, I'm wondering if there is ever a cardinality size, under which point HLL is guaranteed to…

google-cloud-platform google-bigquery hyperloglog

asked May 24 '19 at 18:07

David542

104,438
178
489
842

votes

2 answers

What is hyperloglog and why is this good for?

I was studying data structures supported by Redis and I was not able to find out an explanation that could make me understand what HyperLogLog is. How do I use it and why is this good for?

database redis hyperloglog

asked Mar 13 '18 at 17:07

Vikto

votes

1 answer

redis HLL too many false positives

Hyperlog log is a probablistic algorithm According to the redis HLL document , we could get 0.81% of error but I get errors like 17-20% I think there is something wrong .. This is my simple perl test script. Is there some error #!/usr/bin/perl…

perl redis hyperloglog

asked Mar 21 '17 at 10:27

Ram

1,155
13
34

votes

0 answers

Probabilistic algorithm for set cardinality with support deleting from set

Is there any probabilistic algorithm for calculating set cardinality taking into account that must support deleting elements from set? I've been using HyperLogLogs for calculating cardinalities of some sets and their unions but when necessity of…

algorithm math data-structures hyperloglog

asked Oct 13 '16 at 15:32

user7014602

votes

1 answer

Redis Hyperloglog limitations

I am trying to solve a problem in a hacky way using Redis Hyperloglog but what I am trying to understand is the limitations and assumptions by Hyperloglog on the data or the distribution. The count-min and bloom filter have their own set of…

redis cardinality hyperloglog

asked Apr 05 '16 at 16:02

Chenna V

10,185
11
77
104

votes

1 answer

How to improve performance of PIG job that uses Datafu's Hyperloglog for estimating cardinality?

I am using Datafu's Hyperloglog UDF to estimate a count of unique ids in my dataset. In this case I have 320 million unique ids that may appear multiple times in my dataset. Dataset : Country, ID. Here is my code : REGISTER…

apache-pig cardinality hyperloglog

asked Jul 16 '15 at 20:40

mnadig

votes

1 answer

HyperLogLog intersection: why not use min?

When doing a union between two compatible HyperLogLog objects, you can just take the maximum bucket to do a lossless union that doesn't introduce any new error: Union.Bucket[i] = Max(A.Bucket[i], B.Bucket[i]) When doing an intersection though, you…

hyperloglog

asked Mar 08 '15 at 04:38

Alan Wolfe

votes

1 answer

why is data.fu implementing HyperLogLog as an accumulator and not as algebraic?

data.fu has a nice implementation of HyperLogLog for estimating cardinality here However, it's implemented as Accumulator which means it will run only at the reducer and not in the combiner (but it will never load the entire set into memory as in…

mapreduce apache-pig cardinality hyperloglog

asked Mar 06 '15 at 21:45

ihadanny

4,377
7
45
76

votes

1 answer

HyperLogLog correctness on mapreduce

Something that has been bugging me about the HyperLogLog algorithm is its reliance on the hash of the keys. The issue I have is that the paper seems to assume that we have a totally random distribution of data on each partition, however in the…

hadoop hash mapreduce hyperloglog

asked Aug 05 '14 at 14:48

aaronman

18,343
7
63
78

votes

1 answer

How to apply hyperloglog to a timeseries stream

Can someone explain or link to an explanation about how counting the cardinality of a set with HLL can be used for time series analysis? I'm pretty sure druid.io does exactly this, but I'm looking for a general explanation of how to do this with HLL…

counting druid hyperloglog

asked Apr 05 '14 at 01:48

Emmanuel Oga

vote

0 answers

Writing HyperLogLog Sketches from Apache Spark To Trino

I'm attempting to generate aggregate HLL sketches in a Scala Spark job and push the data to a varbinary in Trino for dashboard aggregations. I'm using the spark-alchemy library to generate the sketches in Spark, but continue to run into…

apache-spark trino hyperloglog

asked Oct 18 '22 at 17:14

J.Fratzke

1,415
15
23

vote

2 answers

PostgreSQL - HyperLogLog extension not found

Can someone explain in a better way (well, in a way for dummies to understand), or more correctly how to install HyperLogLog hll extension for PostgreSQL on my Mac M1 machine. When running CREATE EXTENSION hll; I get: Query 1 ERROR: ERROR: could…

postgresql macos apple-m1 hyperloglog

asked Jun 01 '22 at 02:52

liliget

Prev 1

3 4 5 6 Next