0

I am looking to do probabilistic counting and set membership using structures such as bloom filters and hyperloglog.

Is there any support for using such data structures and performing operations on them atomically on the server-side, through user-defined functions or similar? Or any way for me to add extensions with such functionality?

(I could ingest the data through another system and batch the updates to reduce the contention, but it would be far simpler if all this could be handled in the database server.)

Daniel Siegmann
  • 287
  • 1
  • 3
  • 5

1 Answers1

0

You have to implement them client side. Common approach is to every X min serialize/insert the HLL you keep in memory on your system and then merge them on reads across interested range (maybe using RRD type approach for different periods beyond X min). This is not very durable, so depending on usecase it might mean something more complex.

Although it seems a close fit to C* I think one of the big issues is deletes, but you can probably work around them. Theres a proof of concept for C* side implementation here:

http://vilkeliskis.com/blog/2013/12/28/hacking_cassandra.html

that you can likely get working "well enough". https://issues.apache.org/jira/browse/CASSANDRA-8861 may be something to watch.

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38