1

I have a table of movie ratings that contains millions of rows containing userid's, movieid's and ratings.

| userId | movieId | rating |
------------------------------
| 1      | 213     | 5      |
| 1      | 245     | 4      |
| 2      | 213     | 4      |
| 2      | 245     | 4      |
| 3      | 657     | 5      |
| 3      | 245     | 5      |

I'm trying to figure out a way of grouping together userId's that contain matching sets of movieId's. Ideally I want the query to only find matches if they have at least 5 movieId's in common and if the rating is above 4, but I've simplified it for this example.

In the instance above, userId 1 and 2 would be the only users that match as they both contain the same movieIds. I need a statement that would essentially replicate this. Thanks in advance for any help.

dmbll
  • 113
  • 6

2 Answers2

2

You can perform a self-join on matching movies, filter out records with uninteresting ratings, group by user-pairs and then filter the resulting groups for only those that have at least the requisite number of matching records:

SELECT   a.userId, b.userId
FROM     myTable a JOIN myTable b USING (movieId)
WHERE    a.userId < b.userId
     AND a.rating > 4
     AND b.rating > 4
GROUP BY a.userId, b.userId
HAVING   COUNT(*) >= 5
eggyal
  • 122,705
  • 18
  • 212
  • 237
  • I like the look of this and think it might work, although I have an issue querying that many rows which brings the server to a standstill. – dmbll Jan 29 '16 at 10:39
  • @user3612532: What indexes are defined on the table? – eggyal Jan 29 '16 at 10:41
  • Just a primary key on a ratingsId column, though I don't think it's doing a lot. I don't have a great deal of experience with SQL so apologies if this is all basic stuff. – dmbll Jan 29 '16 at 12:44
1
select movieId, rating 
from tablename
group by movieId 
having count(userId) > 1 and rating > 4;

this gives me movieId 245 and rating 5, which should be correct according to your provided example data, have more than 1 userId and a rating greater than 4.

mazedlx
  • 1,405
  • 17
  • 24