I'm trying to decide a db design that will allow enable fast SELECT
of users according to cross-attributes. I have two kind of attributes:
- Full - each one has these. E.g. location / gender / age etc..
- Slim - e.g. tags/interests. Most users have ~7 out of 50k possible options. These aren't distributed homogenously, e.g. many users have the interest
Music
, but only a few have the interestFunk Rock
.
The data set is 10's of Millions of people, so I'm trying to avoid JOINS.
Saving the data:
For each full attribute I can just keep a column per user and SELECT
accordingly. For the slim attributes I was thinking of creating another table, in this case each user has more than one line, and each line represents an attribute.
SELECT of slim attributes is where I'm facing the problems. Assuming I'm not JOINING the tables in the SELECT to avoid the low performance, I was thinking of dividing the SELECT into two different use cases:
- Searching for a popular slim-attribute, say interest
Music
which will first find C*SearchAmount users matching it from the slim attributes tables, and then filter them according to the full attributes tables, and if filtering too much do this again with bigger C. - Searching for a rare slim-attribute, doing it the opposite way around.
Before implementing this, I wanted to hear of other/better ways to solve this problem.