we are currently facing an interesting problem. We would like to estimate the cardinality of a set without the need to store every single item (typically bitmaps/bitsets are a nice approach). A very nice algorithm is the so called HyperLogLog randomized algorithm (see more here http://antirez.com/news/75).
The problem here is, that you can only merge sets as UNIONs, so basically it's a OR combination.
We actually want not only to combine sets with OR, but as well with AND. We even want to combine these operations.
Example: set1 AND (set2 OR set3) OR (set4 AND set5)
Each set may have a cardinality in the range of millions. Each value has a size of 128 bit.
Each set can be represented in any way e.g. "HLL, bloom filter, a plain list, or a combination of these". The algorithm must execute in the shortest possible amount of time using a feasible amount of space.
Any ideas?