Context:
I have a large data set consisting of 3 columns: individual users, rating of movie, and movie ID.
Problem:
For every row 1 individual user rates 1 individual movie.
This individual user can be on multiple rows rating several individual movies.
I am looking to filter and only show the individual users that have rated a MINIMUM of 3 movies (meaning they show up in 3 or more rows, under the individual user column) and then dropping the other individual users who have rated 2 or 1 movies.
I will put some examples below.
df.head()
userId movieId rating
0 1 307 3.5
1 1 481 3.5
2 1 1091 1.5
3 1 1257 4.5
4 1 1449 4.5
#So for example the above userID 1 is a user I would like to keep because
#he has rated more than 3 movies (5 in this case).
userId movieId rating
5 5 645 3.5
6 5 5678 3.5
7 6 5346 1.5
8 6 1434 4.5
9 7 7421 4.5
#in the above example user 5,6,7 are prime examples of users I would like
#to drop since they have not rated a minimum of 3 movies (2 and in this case)