Pandas drop duplicates within groupby

Question

This is my csv look like,

name, cuisine, review
A, Chinese, this
A, Indian, is
B, Indian, an
B, Indian, example
B, French, thank
C, French, you

I trying to count how many times the diff kind of cuisines appear by name. This is what I should be getting

Cuisine, Count
Chinese, 1
Indian, 2
French, 2

But as you can see there are duplicates within the name e.g. B so I try to drop_duplicates but I can't. I use

df.groupby('name')['cuisine'].drop_duplicates()

and it says series groupby object cannot.

Somehow I need to apply value_counts() to get the number of occurrences of the cuisine word but the duplicates thing is hindering. Any idea how I can get this in pandas? Thanks.

cs95 · Accepted Answer · 2018-11-09T03:30:04.030

4

You're looking for groupby and nunique:

df.groupby('cuisine', sort=False).name.nunique().to_frame('count')

         count
cuisine       
Chinese      1
Indian       2
French       2

Will return the count of unique items per group.

edited Nov 09 '18 at 03:30

answered Nov 09 '18 at 03:24

cs95

379,657
97
704
746

score 2 · Answer 2 · answered Nov 09 '18 at 03:27

2

Using crosstab

pd.crosstab(df.name,df.cuisine).ne(0).sum()
Out[550]: 
cuisine
 Chinese    1
 French     2
 Indian     2
dtype: int64

answered Nov 09 '18 at 03:27

BENY

317,841
20
164
234

Pandas drop duplicates within groupby

2 Answers2