Questions tagged [hyperloglog]

Hyperloglog is an approximate technique for computing the number of distinct entries in a set.

Hyperloglog is an approximate technique for computing the number of distinct entries in a set implemented in Algebird, a scala library for abstract algebra. This can be used in Summingbird to create MapReduce programs for estimating cardinalities of large datasets in streaming (online) or batch (offline) mode. Data structure store Redis also has HyperLogLog implementation.

89 questions

vote

1 answer

Which hash function does HyperLogLog use?

I have read in a few articles that HyperLogLog and LogLog use a hash function and that it is solely responsible for the prediction value. If we assign a value to a certain username to predict the number of times the individual has visited a page,…

asked May 20 '22 at 16:46

Rahul Raheja

vote

1 answer

How to understand that the standard error of redis hyperloglog is 0.81%

I am confused with hyperloglog standard error 0.81%, so I change rand() to $n+$j in https://github.com/redis/redis/blob/unstable/tests/unit/hyperloglog.tcl#L48 and change 5%->0.81% in…

redis hyperloglog

asked Mar 18 '22 at 07:48

ming

vote

0 answers

Error when trying to process HyperLogLog created on Snowflake, in Trino

In Trino, I'm getting the error message Cannot deserialize HyperLogLog: I have a query on Snowflake, doing the following: select __TENANT_ID hll_accumulate(VISITOR_ID) as visitor_hll from [table] where …

amazon-s3 snowflake-cloud-data-platform parquet trino hyperloglog

asked Feb 02 '22 at 09:49

Ethan1701

vote

1 answer

Redis - Count distinct problem (without hyper log log)

I should solve a count-distinct problem in Redis without the use of HyperLogLog (because of the 0.81% of known error). I got different requests with a list of objects [O1, O2, ... On] for a specific Key A. For each list of objects received, Redis…

redis count distinct redis-cluster hyperloglog

asked Jan 27 '22 at 19:34

lordav

vote

0 answers

How do we use BigQuery HLL (HyperLogLog) functions in Looker

I have a quick question on how we can use the BigQuery HLL functions in Looker. For example, there is a BigQuery table with the following structure Sample BigQuery Table In looker do I need to define this field respondents_hll as a dimension or…

google-bigquery unique looker hyperloglog

asked Nov 12 '21 at 04:25

iPrithvi

vote

1 answer

how do I increase the accuracy of redis hyperloglog

I am using a very simple implementation of redis HLL PFADD to add the elements and PFCOUNT ( something with PFMERGE ) to get the count Is there a way I can tune the efficiency of redis HLL , by increasing memory allocated etc

redis hyperloglog

asked Sep 03 '21 at 01:46

Ram

1,155
13
34

vote

1 answer

Using HyperLogLog functions in BigQuery can you get different results from the same query on the same data?

My query looks like: SELECT HLL_COUNT.MERGE((SELECT HLL_COUNT.INIT(key.item) FROM UNNEST(data.list) key)), FROM dataset let's say I run this query 10000 times (on the same set of data), will I get 10000 identical results or a small percentage…

google-bigquery hyperloglog

asked Jan 13 '21 at 15:31

Ire00

vote

1 answer

Django culmulative sum of HyperLogLog (HLL) Postgres field

I'm using the HyperLogLog (hll) field to represent unique users, using the Django django-pg-hll package. What I'd like to do is get a cumulative total of unique users over a specific time period, but I'm having trouble doing this. Given a model…

django postgresql hyperloglog

asked May 21 '20 at 15:08

Darkstarone

4,590
8
37
74

vote

1 answer

BigQuery to Data Studio : Show reliable COUNT DISTINCT regardless of the selected period

in my BigQuery project I store event data integrated from Firebase. The granularity and dimension is such that trying to present raw data in Data Studio quickly makes the report become VERY slow (1-2 min per page/interaction). I then started to…

count google-bigquery distinct looker-studio hyperloglog

asked Sep 26 '19 at 14:03

Giorgio Terreni

vote

1 answer

Distinct Count algorithm

I am wondering if it is possible to do an approximate distinct count in the following way: I have an aggregation like this: +---------+----------------------+-------------------------------+ | country | unique products sold | helper_data --…

python algorithm google-bigquery hyperloglog

asked May 24 '19 at 01:04

David542

104,438
178
489
842

vote

1 answer

URL filtering on top of Redis: Bloom filters or HyperLogLog data structure

I want to implement URL filtering for the distributed crawling system on top of Redis database (e.g. don't visit the same URL twice, so I need somehow to keep tracking all of them with the minimal memory fingerprint, there is no need to store full…

redis bloom-filter hyperloglog

asked Feb 22 '19 at 11:01

d-d

1,775
3
20
29

vote

0 answers

How does hashing a stream of values guarantees randomness in hyperloglog?

From this stackoverflow post The main trick behind this algorithm is that if you, observing a stream of random integers, see an integer which binary representation starts with some known prefix, there is a higher chance that the cardinality of the…

algorithm data-structures bigdata computer-science hyperloglog

asked Feb 20 '19 at 19:38

user10714010

vote

1 answer

Execute extract on Tableau for distinct count using HLL

I have a somewhat huge table (130 million rows), that I am able to crunch on the same server in under 10 minutes, and produce a slimmed-down, pre-aggregated table, that works just fine and everyone is happy to use it. The table is grouped by around…

postgresql tableau-api distinct-values hyperloglog

asked Oct 26 '18 at 17:14

Alex

14,338
5
41
59

vote

2 answers

Is there any effective way to reduce error in HyperLogLog ( redis )?

In redis , we treat hyperLogLog as set to distinct elements. As everyone knows, for each key, HLL consumes only 12kb memory and produces approximations with a standard error of 0.81% Since I got so much elements to count. So here I wanna to lower…

database algorithm data-structures redis hyperloglog

asked Jul 02 '18 at 04:19

simon_qiu

vote

1 answer

How to get unique user count for custom Firebase event with multiple dimensions applied?

I'm currently trying to count unique users for my custom Firebase events in BigQuery. While I've been able to get to the figures in aggregation by using the APPROX_COUNT_DISTINCT function, I'm still stuck to get the correct (unique) count when…

firebase google-bigquery firebase-analytics hyperloglog

asked May 15 '18 at 13:25

Peter P

Prev 1 2

4 5 6 Next