0

The scenario is really simple. I'm adding 50 elements (different each time) to a HLL. Usually at the third time, I get a wrong PFCOUNT (151 instead of 150). I know that the HLL has a low error rate but is it so easy to get a false positive ? can this error be handled?

Thanks in advance

Here's the logs.

127.0.0.1:6379> PFADD test DaG4yPCb vrTDeJde SCcK4rvG K0UJPxeT s1RtvWyf EpkUaxhY y4ot0BQW vt13T2eS 5rFe0TKj yXm25gXb 4nnw8YYy Fnqdb4C6 rwuPLUyC W9uS0az7 koOtrENo hIjAa00k eT3VvI7Q zQVhYnYY 1Cshhbbk 8q3B82gH NWlnW5QH fbNYBXoy 4ti95TeI TiUyXs0W TAepHjdd CK26UGuC ESt9opXO ihYIo1L9 0XqFKx8x coh31ZxE 01G7eCjb wJZYByUo ZHfJIKoQ tFGPsdgZ 19DUQvNX 20QtyIVq Xjx4wT9z nJazaXtH cHEqmQjZ hz8j0uhT hpeygfWk hWBf44rU iUJbsPSY nIYDiV80 FgaEU3pI 7EEkDGY6 tPF0KHFM twVbY3wR xFpEg4jP 4JEW0pue 127.0.0.1:6379> PFCOUNT test (integer) 50

127.0.0.1:6379> PFADD test elapxije pbjtcvbg pjoiaarc pogpnjqd ujzfiuyu kykxhqpl hnkwmwpq gljpsnwu rlnflrdb wexqthqe hwbcgbvt yjdddtpo lnkqcoaz tcjgnxme aiflckyh rfsmwzgw eooownar pkvhdwae tywuoxgv mojqkmqd gepsxhqj cbgrmzih jkormrfk irasppno mmealsye fdumtspr anisssut tuqlufyr coqebpyn zijsoauj akvcvkda jruskmma kalinqpr lsazgswh ozyajcpm edvodqnt befvtsbx bcaurnjh psgdgval pyktekgo kucfjnov xruaulrl rrwqzjac ppbbhdhz iohaeoiq fbztqesn zsfnxzsa masqfqjo fsybqced xzfdhtzv (integer) 1 127.0.0.1:6379> PFCOUNT test (integer) 100

127.0.0.1:6379> PFADD test hukqyega olgswnll ufzjkscd oygfsgdu bttlwivr xrvtjsfc criuaabz idxilrvd kitvpuzb ehwrvcip ljthitya clgciaex bagxomaq ziszyehx uuhytedx xycrfcgf nmbnxkav ylxxyyrp rfwniodp vezvqefz gomrekbf tirdnpbp fpbokjjz dwppiomo zgypqxyh kavukjeb wsomngmh oawosnvf tinruzjc bbfqchbn airifskr dqcaznzt vnpfejep jmdlwbek eubhstbo iamgnktp gfojfegy hvmbszlu poauswtc tdgozdfy cxdsprqo pjsuxult nctztxwb fbayirlw dcitezyn zufryoro tisxdwtn mmgztjie vykdkvwm dqogmhnm (integer) 1 127.0.0.1:6379> PFCOUNT test (integer) 151

lordav
  • 105
  • 1
  • 2
  • 10

1 Answers1

2

From https://redis.io/commands/PFCOUNT

The returned cardinality of the observed set is not exact, but approximated with a standard error of 0.81%.

In your case it is 1/150~=0.67% which is well within the documented standard error.

Anton
  • 3,587
  • 2
  • 12
  • 27
  • Is there a way to solve the count-distinct problem without errors in Redis ? The sets are not the right solution for the performances and memory. – lordav Jan 26 '22 at 20:41
  • 1
    If you need the exact count - you can try bitmaps. It still grows linearly with the # of members, but might be significantly smaller than sets if you need to maintain multiple unique counters. – Anton Jan 26 '22 at 20:57
  • Thanks Anton. Have you ever used a bitmap to solve the count distinct problem ? Do you have references to some github projects or other materials ? – lordav Jan 27 '22 at 10:27