Questions tagged [hyperloglog]

Hyperloglog is an approximate technique for computing the number of distinct entries in a set.

Hyperloglog is an approximate technique for computing the number of distinct entries in a set implemented in Algebird, a scala library for abstract algebra. This can be used in Summingbird to create MapReduce programs for estimating cardinalities of large datasets in streaming (online) or batch (offline) mode. Data structure store Redis also has HyperLogLog implementation.

89 questions
0
votes
1 answer

Count Unique using Redis and MongoDB (HyperLogLog)

I have a collection in MongoDB with a sample doc as follows - { "_id" : ObjectId("58114e5e43d6420b7db4e15c"), "browser" : "Chrome", "name": "hyades", "country" : "in", "day" : "16-10-21", "ip" : "0.0.0.0", "class" :…
hyades
  • 3,110
  • 1
  • 17
  • 36
0
votes
1 answer

HyperLogLog implementation with Spark batch + Cassandra

I am looking to implement HyperLogLog algorithm to count distinct users for different audience segments (or filters). I user Cassandra + Spark batch. Wondering if Cassandra provides any support for HyperLogLog type. I could not find any plugin or…
Sammy
  • 151
  • 2
  • 6
0
votes
3 answers

Why does Hyperloglog work and which real-world problems?

I know how Hyperloglog works but I want to understand in which real-world situations it really applies i.e. makes sense to use Hyperloglog and why? If you've used it in solving any real-world problems, please share. What I am looking for is, given…
Chenna V
  • 10,185
  • 11
  • 77
  • 104
0
votes
1 answer

How to increase performance of calculating aggregations?

The problem I am trying to address seems to be trivial. I have huge collections of events (actually they come from mobile app so they are mobile events). Each event is described by several attributes: operating_system create_time version resolution…
homar
  • 575
  • 1
  • 7
  • 19
0
votes
1 answer

DataStructure for Intersection Counts

We have a requirement where we have to maintain distinct counts every hour of day of month, for various combinations(user meeting a criteria). We are thinking of using HyperLogLog for it, one of other requirements is to provide a counts of the union…
anishek
  • 1,675
  • 2
  • 13
  • 19
0
votes
1 answer

Simple cardinality estimation algorithm

There's the HyperLogLog algorithm, but it is quite complex. Is there any simpler space efficient approach that could be expressed in couple of lines of code?
Alex Craft
  • 13,598
  • 11
  • 69
  • 133
0
votes
1 answer

Atomic probabilistic counting and set membership in MongoDB

I am looking to do probabilistic counting and set membership using structures such as bloom filters and hyperloglog. I assume I can store such structures as binary data, but I don't want to use optimistic locking (a.k.a. update if current) because…
Daniel Siegmann
  • 287
  • 1
  • 3
  • 5
0
votes
1 answer

Counter grouped by category, author and date in Redis

I am implementing a system that store a large amount of data in a relational DB. Data can be classified into categories and have an author. I want to get the number of items grouped by date, category and author and the sum of all items of each…
Garet
  • 365
  • 2
  • 13
0
votes
0 answers

creating hyperloglog in mongodb

I am trying to write hyperloglog in mongodb. What is equivalent version of query (written in oracle) in mongodb create table my_hll as select mod(ora_hash(n), 1024) bucket, max(num_zeroes(trunc(ora_hash(n)/1024)))+1 val from previous_data group by…
user3526896
  • 129
  • 1
  • 3
  • 11
0
votes
1 answer

What are leading zeroes in regards to HyperLogLog?

I was reading antirez.com and Wikipedia and some other sources to understang what HLL is and how it works, but each time the term "Leading Zeroes" is used I stumble. Please explain what it means when we talk about HyperLogLog.
exebook
  • 32,014
  • 33
  • 141
  • 226
0
votes
0 answers

redis hyperloglog.pfmerge inconsistency

pfadd today, item1, item2, ..., itemM pfadd tomorrow, item1, item2, ..., itemN pfadd so-on, item1, item2, ..., itemP ... pfcount today // returns 8000 pfcount tomorrow // returns 9000 pfcount so-on // returns 13000 pfcount today,…
Inanc Gumus
  • 25,195
  • 9
  • 85
  • 101
0
votes
1 answer

Return value of PFADD in Redis

According to Redis documentation on PFADD command: Return value Integer reply, specifically: 1 if at least 1 HyperLogLog internal register was altered. 0 otherwise. Can anyone explain the following two points? Does this mean PFADD will return…
mjalajel
  • 2,171
  • 21
  • 27
-1
votes
1 answer

Hyperloglog for Tinkerpop, .count() approximation

Is there a solution similar to Hyperloglog for graph databases like Tinkerpop. .count() step takes forever on large dataset, however approximation would be sufficient
-3
votes
1 answer

Possible to do a pivot outside of BigQuery?

Let's say I'm looking to build the following pivot table: // count by age age male female 1-25 18 23 26-100 19 10 To do this, I can do a basic aggregation like this: SELECT age, gender, count(*) GROUP BY age,…
David542
  • 104,438
  • 178
  • 489
  • 842
1 2 3 4 5
6