0
pfadd today, item1, item2, ..., itemM
pfadd tomorrow, item1, item2, ..., itemN
pfadd so-on, item1, item2, ..., itemP
...

pfcount today      // returns  8000
pfcount tomorrow   // returns  9000
pfcount so-on      // returns 13000

pfcount today, tomorrow, so-on
                   // returns 28000

Although the items are approximately the same the cardinality differs too much, why is that? I was expecting around cardinality of 12000 after pfcount for all the days.


date    pfadd      pfcount

10-15   40.754     205.322
10-14   40.055     196.249
10-13   39.877     193.830
10-12   13.079     18.151

Also, from the above data, I have counted when pfadd returns 1 in the pfadd column above, and also execute pfcount above. Why pfadd and pfcount is so different?

Inanc Gumus
  • 25,195
  • 9
  • 85
  • 101
  • PFCOUNT for multiple keys will return the sum of counts whereas it appears you want to use PFMERGE - have you tried that? – Itamar Haber Oct 15 '14 at 18:17
  • No it does not according to the documentation: When called with multiple keys, returns the approximated cardinality of the union of the HyperLogLogs passed, by internally merging the HyperLogLogs stored at the provided keys into a temporary hyperLogLog. – Inanc Gumus Oct 16 '14 at 08:50
  • What do you mean by "_the items are approximately the same_"? Do you add the same elements to each key? Or the _number_ of elements you add is the same? Also, do you use `pfadd` with multiple elements? – Cristian Greco Oct 16 '14 at 15:29
  • What do you mean by 'do you use pfadd with multiple elements'? Approximately means adding almost the same items to hyperlogged vars. – Inanc Gumus Oct 17 '14 at 16:04

0 Answers0