1

I am developing a recommendation engine, so that requires storing lots of data and keeping track of every move made by the user. So, basically my website is a product search engine and will be having sets of queries as users data. Following are some examples of the data set

Example

User1 :
1. Apple Ipod tOuch
2. Samsung Galaxy Ace Plus
3. HArry Porter 

User2 :
1. Product1
2. Product2

and so on.

One way(naive) could be having an ID associated with each of my users and then having a string corresponding to that ID which will be of this form(strings separated with ~) :-

Unique ID - Apple IPod TOuch~Samsung Galaxy Ace Plus~HArry Porter 

But this method won't be efficient considering how I would be playing with those data later on.

Can any one come up with a very efficient model fairly easily implementable in mysql ?

Comment if I am unclear in asking my doubt.

Prashant Singh
  • 3,725
  • 12
  • 62
  • 106
  • 2
    Did you read anything about relational databases and normalization? – Remus Rusanu Oct 24 '12 at 06:09
  • @RemusRusanu Not yet, but tell me, I will go through it. Thanks :) – Prashant Singh Oct 24 '12 at 06:11
  • Your method breaks the first normal form. Be carefull here lies dragons. If you need to really store data like this then a NoSQL alternative might be more suited to your needs. Like Remus mentioned please check the normilization forms of relational databases and why they are there. – Namphibian Oct 24 '12 at 06:26
  • 1
    Start with http://en.wikipedia.org/wiki/Database_normalization .Follow with the links listed at the end of the Wikipedia article. – Remus Rusanu Oct 24 '12 at 07:43

3 Answers3

5

The classic design is a table for users :

Users(user_id,user_name,reg_date....)

table for products :

Products(prod_id,prod_name,prod_cost....)

table with mapping user-->products :

User_products(user_id,prod_id ....)

Example :

Users :

user_id|user_name
1200   | User1 
7856   | User2

Products :

prod_id | prod_name
12900   | Apple Ipod tOuch
45673   | Samsung Galaxy Ace Plus
99876   | HArry Porter 
34590   | Product1
56283   | Product2

User_products :

user_id | prod_id
1200    |12900
1200    |45673
1200    |99876
7856    |34590
7856    |56283
Grisha Weintraub
  • 7,803
  • 1
  • 25
  • 45
  • 2
    @PrashantSingh: I don't know where you're coming from, but the above solution is definitely the canonical method. The naive method would have just as many problems if the number of users is infinite, and searching through a comma delimited (or tilda delimited) string is slow and useless when trying to answer the question 'who has product 1'. By mistake, I marked your comment as useful; it is both insulting and not useful. – No'am Newman Oct 24 '12 at 06:30
  • @Prashant Singh - databases are usually deal with infinity sets of data, so your case is not a special. In order to claim which method is better, you have to define what kind of queries you want to support. – Grisha Weintraub Oct 24 '12 at 06:31
  • @No'amNewman I was wondering about that (as in, who would upvote that comment lol), you can cancel a comment upvote by clicking the upvote again – Esailija Oct 24 '12 at 06:31
  • @Esailija: clicking on the up arrow doesn't do anything (at least, not for me). Maybe someone else can downnvote that comment? – No'am Newman Oct 24 '12 at 06:34
  • @No'amNewman Ah ok then, the feature must have been removed or I have misunderstood it... and no, one can't downvote comments :( – Esailija Oct 24 '12 at 06:36
  • @all Sorry, for that comment. I have already deleted it. What I thought that how will I be mapping from user to products? How will I remember the mappings ? – Prashant Singh Oct 24 '12 at 06:41
  • @PrashantSingh - see the example with data from your question. – Grisha Weintraub Oct 24 '12 at 06:54
  • @Grisha Yeah, that's fine. Was already aware about this method. But imagine if Google product search uses this to store history of its users. One day it will cross all limits i guess. I was doubtful about that thing. User_Products table will grow exponentially in this method. Is there any way like lots of IDs being mapped to same query, but that will require storing product ID or something at users table. So, your solution is pretty complicated. Are there no solutions available better than this ? Just wanna know. No insulting, nothing , just open view – Prashant Singh Oct 24 '12 at 07:13
1

Avoid strings separated with some identifier because you'll have to work with the data submitted, otherwise your search engine will be very slow when you'll get really big amount of data. I think Grisha is absolutely right - user or product searches (numeric id searches), joined with mapping tables will output the result much faster than searches through text/varchar fields, separating the results, etc.

Adi
  • 49
  • 2
1

Using the canonical approach as proposed by Grisha, the query 'who has product 1' would be represented thus

select user.user_name
from users inner join user_products on users.user_id = user_products.user_id
inner join products on products.prod_id = user.products.prod_id
where products.prod_name = 'Product 1'

This may look complicated but it's actually very simple and very powerful. If there were another field in the user_products table such as purchase date, you could also find out when those users bought product 1, or find all the users who bought the product during a given period, by means of a simple extension to the query.

No'am Newman
  • 6,395
  • 5
  • 38
  • 50
  • It needn't look quite so complicated. In this case, one could do `SELECT user_name FROM users NATURAL JOIN user_products NATURAL JOIN products WHERE prod_name = 'Product 1'`. However, I generally avoid `NATURAL JOIN` for the same reason that I avoid `SELECT *`; one can instead compromise half-way with `JOIN ... USING()`. – eggyal Oct 24 '12 at 06:45