Hi all and thanks in advance. I am new to the NoSQL game but my current place of employment has tasked me with set comparisons of some big data.
Our system has customer tag set and targeted tag sets.
A tag is an 8 digit number.
A customer tag set may have up to 300 tags but averages 100 tags
A targeted tag set may have up to 300 tags but averages 40 tags.
Pre calculating is not an option as we are shooting for a potential customer base of a billion users.
(These tags are hierarchical so having one tag implies that you also have its parent and ancestor tags. Put that info aside for the moment.)
When a customer hits our site, we need to intersect their tag set against one million targeted tag sets as fast as possible. The customer set must contain all elements of the targeted set to match.
I have been exploring my options and the set intersection in Redis seems like it would be ideal. However, my trolling through the internet has not revealed how much ram would be required to hold one million tag sets. I realize the intersection would be lightning fast, but is this a feasable solution with Redis.
I realize this is brute force and inefficient. I also wanted to use this question as means to get suggestions for ways this type of problem has been handled in the past. As stated before, the tags are stored in a tree. I have begun looking at Mongodb as a possible solution as well.
Thanks again