0

By referencing Collaborative filtering in MySQL? , I have created the following ones:

CREATE TABLE `ub` (
  `user_id` int(11) NOT NULL,
  `book_id` varchar(10) NOT NULL,
  `rate` int(11) NOT NULL,
  PRIMARY KEY (`user_id`,`book_id`),
  UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into ub values (1, 'A', '8'), (1, 'B', '7'), (1, 'C', '10');
insert into ub values (2, 'A', '8'), (2, 'B', '7'), (2, 'C', '10'), (2,'D', '8'), (2,'X', '7');
insert into ub values (3, 'X', '10'), (3, 'Y', '8'), (3, 'C', '10'), (3,'Z', '10');
insert into ub values (4, 'W', '8'), (4, 'Q', '8'), (4, 'C', '10'), (4,'Z', '8');

Then, I can able to get the following table and understand how it works.

create temporary table ub_rank as 
select similar.user_id,count(*) rank
from ub target 
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id and target.rate= similar.rate
where target.user_id = 1
group by similar.user_id;

select * from ub_rank;

+---------+------+
| user_id | rank |
+---------+------+
|       2 |    3 |
|       3 |    1 |
|       4 |    1 |
+---------+------+

However, I start to be confused after the following code.

select similar.rate, similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id 
left join ub target on target.user_id = 1 and target.book_id = similar.book_id and target.Rate= similar.Rate 
where target.book_id is null
group by similar.book_id
order by total_rank desc, rate desc;

+---------+------------+
| book_id | total_rank |
+---------+------------+
| X       |          4 |
| D       |          3 |
| Z       |          2 |
| Y       |          1 |
| Q       |          1 |
| W       |          1 |
+---------+------------+



   (1, 'A', '8'), (1, 'B', '7'), (1, 'C', '10');

   (2, 'A', '8'), (2, 'B', '7'), (2, 'C', '10'), (2,'D', '8'), (2,'X', '7');

What I wanna do is that, suppose user 1 and 2 have similar behavior ( chosen A,B,C before with matched rating), thus I will recommend D to user A , as it has a higher rate.

Seems the code above not to do so? As, the first ranked is X. How can I change the code in order to achieve the goal mentioned?

Or, actually does the existing method is a better/more accuracy for recommendation?

Community
  • 1
  • 1
HUNG
  • 525
  • 7
  • 17
  • Note that in your last query, you have omitted the results from the `rate` column, and also that those results will essentially be random (since `similar.rate` is not aggregated, grouped on or functionally dependent on a grouping item). –  Mar 26 '13 at 12:15
  • @Mark Bannister I am not very familiar with it, would u mind give me more hints ? – HUNG Mar 26 '13 at 12:20
  • I don't understand your response - did you understand my comment? –  Mar 26 '13 at 12:22
  • For the rate column, seems that I need to add some extra conditions or something else to make the rate column work? – HUNG Mar 26 '13 at 12:27
  • Yes - it depends on what you want the rate column to represent. At the moment each book can have many different rate values (one from each user), so your existing query will return one of those values at random. –  Mar 26 '13 at 12:31

1 Answers1

2

The existing query is ranking the results based on the total value of rank for each book, and then using rate as a tie-break for books which have the same total rank. (Also, rate will essentially be random since similar.rate is not aggregated, grouped on or functionally dependent on a grouping item in the query.)

As such, X will be ranked higher than D because it has been chosen by one user of rank 3 and one user of rank 1, giving a total rank of 4, whereas D has only been chosen by one user of rank 3.

You could change the query to include a rating element weighted by ranking - for example:

select similar.book_id, 
       sum(ub_rank.rank) total_rank, 
       sum(ub_rank.rank*similar.rate) wtd_rate
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id 
left join ub target on target.user_id = 1 and target.book_id = similar.book_id and target.Rate= similar.Rate 
where target.book_id is null
group by similar.book_id
order by wtd_rate desc, total_rank desc

- although in this case this will still rank X higher, as it has a rating of 7 from a user of rank 3 plus a rating of 10 from a user of rank 1, giving a total rank of 31, compared with D's total rank of 24.

(SQLFiddle here)

If you want X to rank higher than D, you need to decide what criteria you are going to use that would rank X higher than D.