1

Context:

I have a large data set consisting of 3 columns: individual users, rating of movie, and movie ID.

Problem:

For every row 1 individual user rates 1 individual movie.

This individual user can be on multiple rows rating several individual movies.

I am looking to filter and only show the individual users that have rated a MINIMUM of 3 movies (meaning they show up in 3 or more rows, under the individual user column) and then dropping the other individual users who have rated 2 or 1 movies.

I will put some examples below.

df.head()

userId  movieId rating
0   1   307     3.5
1   1   481     3.5
2   1   1091    1.5
3   1   1257    4.5
4   1   1449    4.5 

#So for example the above userID 1 is a user I would like to keep because 
#he has rated more than 3 movies (5 in this case).


userId  movieId rating
5   5   645     3.5
6   5   5678    3.5
7   6   5346    1.5
8   6   1434    4.5
9   7   7421    4.5 

#in the above example user 5,6,7 are prime examples of users I would like 
#to drop since they have not rated a minimum of 3 movies (2 and in this case) 

Christian Torres
  • 143
  • 1
  • 1
  • 7

0 Answers0