0

I'm trying to decide a db design that will allow enable fast SELECT of users according to cross-attributes. I have two kind of attributes:

  1. Full - each one has these. E.g. location / gender / age etc..
  2. Slim - e.g. tags/interests. Most users have ~7 out of 50k possible options. These aren't distributed homogenously, e.g. many users have the interest Music, but only a few have the interest Funk Rock.

The data set is 10's of Millions of people, so I'm trying to avoid JOINS.

Saving the data:

For each full attribute I can just keep a column per user and SELECT accordingly. For the slim attributes I was thinking of creating another table, in this case each user has more than one line, and each line represents an attribute.

SELECT of slim attributes is where I'm facing the problems. Assuming I'm not JOINING the tables in the SELECT to avoid the low performance, I was thinking of dividing the SELECT into two different use cases:

  1. Searching for a popular slim-attribute, say interest Music which will first find C*SearchAmount users matching it from the slim attributes tables, and then filter them according to the full attributes tables, and if filtering too much do this again with bigger C.
  2. Searching for a rare slim-attribute, doing it the opposite way around.

Before implementing this, I wanted to hear of other/better ways to solve this problem.

Noam
  • 3,341
  • 4
  • 35
  • 64
  • 1
    *"The data set is 10's of Millions of people, so I'm trying to avoid JOINS."* that would be your problem there's no reason to avoid joins no matter how large the data set is –  Oct 31 '12 at 09:06
  • possible duplicate of [handling large dataset using MySQL](http://stackoverflow.com/questions/5898741/handling-large-dataset-using-mysql) –  Oct 31 '12 at 09:10
  • @Dagon thanks for the link, I'm checking relevancy. Regarding your comment on JOINS, can you elaborate? if I can find a way to put the data in the same table and avoid the JOIN, wouldn't that be much better? – Noam Oct 31 '12 at 09:15
  • no, normalisation would be much better, avoiding joins usually means you are not doing that. –  Oct 31 '12 at 18:49

0 Answers0