I have huge quantities of data represented as (for example) -
User ID | Gender | Location | Type of User
There may be more columns depending on the use case. The location is denoted by a pincode.
I recently read about HyperLogLog and the Redis implementation. So for example, I can conveniently get a count for just male users or users of a certain "type" and I can merge these hyperloglog sets to answer questions like -
Count of Unique Users who are male and of type A
The problem is when I have to deal with columns like location. I cannot store sets for each possible pincode. So a question like -
Count of unique users who are male and belong to pincodes A and B
are hard to answer using this method.
Using HyperLogLog or redis is not a constraint. I am open to use any tool available provided it solves the problem.