Pandas: Filter Column by how many times the item shows up in a DataFrame

Asked May 07 '19 at 19:19

Active May 07 '19 at 19:19

Viewed 48 times

Context:

I have a large data set consisting of 3 columns: individual users, rating of movie, and movie ID.

Problem:

For every row 1 individual user rates 1 individual movie.

This individual user can be on multiple rows rating several individual movies.

I am looking to filter and only show the individual users that have rated a MINIMUM of 3 movies (meaning they show up in 3 or more rows, under the individual user column) and then dropping the other individual users who have rated 2 or 1 movies.

I will put some examples below.

df.head()

userId  movieId rating
0   1   307     3.5
1   1   481     3.5
2   1   1091    1.5
3   1   1257    4.5
4   1   1449    4.5 

#So for example the above userID 1 is a user I would like to keep because 
#he has rated more than 3 movies (5 in this case).


userId  movieId rating
5   5   645     3.5
6   5   5678    3.5
7   6   5346    1.5
8   6   1434    4.5
9   7   7421    4.5 

#in the above example user 5,6,7 are prime examples of users I would like 
#to drop since they have not rated a minimum of 3 movies (2 and in this case)

asked May 07 '19 at 19:19

Christian Torres

Look at `value_counts`. Then just select values over the threshold and index your DataFrame. – user3483203 May 07 '19 at 19:23
1

`df.set_index('userId')[df.userId.value_counts().ge(3)]` – user3483203 May 07 '19 at 19:24
1

Or `df[df.groupby('userId')['userId'].transform('count')>3].copy()` – BENY May 07 '19 at 19:25
You can use filter, df.groupby('userId').filter(lambda x: len(x) >= 3) – Vaishali May 07 '19 at 19:26

Pandas: Filter Column by how many times the item shows up in a DataFrame

0 Answers0