4

I have the following dataframe:

df = pd.DataFrame([
    ('A', 'A', 'Cat'),
    ('A', 'A', 'Dog'),
    ('A', 'A', 'Cat'),
    ('A', 'B', 'Dog'),
    ('B', 'B', 'Rat'),
    ('B', 'C', 'Cat'),
    ('B', 'C', 'Cat')
], columns=['id', 'group', 'Animal'])

I want to group it by id and group and calculate the occurrence of Cat in each group. An example output will be:

[2, 0, 0, 2]

2 cat in group AA,

0 cat in group AB,

0 cat in group BB,

2 cat in group BC

Can anyone help? Thanks!

Reslan Tinawi
  • 448
  • 4
  • 12
user9617403
  • 59
  • 1
  • 1
  • 3
  • Does this answer your question? [How to do a conditional count after groupby on a Pandas Dataframe?](https://stackoverflow.com/questions/45752601/how-to-do-a-conditional-count-after-groupby-on-a-pandas-dataframe) – Reslan Tinawi May 05 '21 at 07:20

4 Answers4

8

You'll need mask + groupby:

df['Animal'] = df['Animal'].mask(df['Animal'].ne('Cat'))
df.groupby(['id', 'group'])['Animal'].count().tolist()

[2, 0, 0, 2]
cs95
  • 379,657
  • 97
  • 704
  • 746
6

You can using value_counts

df.groupby(['id','group']).Animal.value_counts().unstack(fill_value=0).loc[:,'Cat'].tolist()
Out[144]: [2, 0, 0, 2]

More info return before the last step .loc

df.groupby(['id','group']).Animal.value_counts().unstack(fill_value=0)
Out[145]: 
Animal    Cat  Dog  Rat
id group               
A  A        2    1    0
   B        0    1    0
B  B        0    0    1
   C        2    0    0
BENY
  • 317,841
  • 20
  • 164
  • 234
4

Your question is very similar (if not duplicate) to this question.

When you want to perform some aggregation which are not part of pandas (e.g. sum, max, min) then you should use the apply method.

Please note that apply can be slower, so don't start using everywhere.

Now, for your question, you start with grouping by:

df.groupby(by=["id", "group"])

Then, you select the Animal column from the groups:

df.groupby(by=["id", "group"])['Animal']

At this point, you have values from the Animal column for each different group, ready for any aggregation.

Let's define a function to count Cats in a pandas Series:

def count_cat(series):
    return (series == 'Cat').sum()

All you have to do now, is to pass this function to the apply method, like the following:

df.groupby(by=["id", "group"])['Animal'].apply(count_cat).reset_index(name='count_cat')

And the result would be:

id group count_cat
0 A A 2
1 A B 0
2 B B 0
3 B C 2
Reslan Tinawi
  • 448
  • 4
  • 12
0

if 0 values are not important than you can try:

list(df[df['Animal']=='Cat'].groupby(['id', 'group']).count()['Animal'])

Sorry for my English