6

So I have a df like this:

NAME    TRY SCORE  
Bob   1st   3  
Sue   1st   7  
Tom   1st   3  
Max   1st   8  
Jay   1st   4  
Mel   1st   7  
Bob   2nd   4  
Sue   2nd   2  
Tom   2nd   6  
Max   2nd   4  
Jay   2nd   7  
Mel   2nd   8  
Bob   3rd   3  
Sue   3rd   5  
Tom   3rd   6  
Max   3rd   3  
Jay   3rd   4  
Mel   3rd   6 

I want to count haw mant times each person scores more than 5?
into a new df2 that looks like this:

NAME    COUNT  
Bob     0  
Sue     1  
Tom     2  
Mary    1  
Jay     1  
Mel     3  

My attempts have been many - here is the latest

df2 = df.groupby('NAME')[['SCORE'] > 5].count().reset_index(name="count")
jpp
  • 159,742
  • 34
  • 281
  • 339
DLB
  • 101
  • 2

5 Answers5

2

First create boolean mask and then aggregate by sum- Trues values are processes like 1:

df2 = (df['SCORE'] > 5).groupby(df['NAME']).sum().astype(int).reset_index(name="count")
print (df2)
  NAME  count
0  Bob      0
1  Jay      1
2  Max      1
3  Mel      3
4  Sue      1
5  Tom      2

Detail:

print (df['SCORE'] > 5)

0     False
1      True
2     False
3      True
4     False
5      True
6     False
7     False
8      True
9     False
10     True
11     True
12    False
13    False
14     True
15    False
16    False
17     True
Name: SCORE, dtype: bool
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Just using groupby and sum

df.assign(SCORE=df.SCORE.gt(5)).groupby('NAME')['SCORE'].sum().astype(int).reset_index()
Out[524]: 
  NAME  SCORE
0  Bob      0
1  Jay      1
2  Max      1
3  Mel      3
4  Sue      1
5  Tom      2

Or we using set_index with sum

df.set_index('NAME').SCORE.gt(5).sum(level=0).astype(int)
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 2
    I don't like this answer. Using the dot notation to access the column `SCORE` is not idiomatic and will only work with strings without spaces. I also don't like the use of `assign` which only complicates matters for new users. – Ted Petrou Jun 06 '18 at 14:22
1

One way to do this is to write a custom groupby function where you take the scores of each group and sum up those that are greater than 5 like this:

df.groupby('NAME')['SCORE'].agg(lambda x: (x > 5).sum())


NAME
Bob    0
Jay    1
Max    1
Mel    3
Sue    1
Tom    2
Name: SCORE, dtype: int64
Ted Petrou
  • 59,042
  • 19
  • 131
  • 136
0

If you want counts as a dictionary, you can use collections.Counter:

from collections import Counter

c = Counter(df.loc[df['SCORE'] > 5, 'NAME'])

For a dataframe you can map counts from unique names:

res = pd.DataFrame({'NAME': df['NAME'].unique(), 'COUNT': 0})

res['COUNT'] = res['NAME'].map(c).fillna(0).astype(int)

print(res)

   COUNT NAME
0      0  Bob
1      1  Sue
2      2  Tom
3      1  Max
4      1  Jay
5      3  Mel
jpp
  • 159,742
  • 34
  • 281
  • 339
0

Filter dataframe first, then groupby with aggregation and reindex to fill missing values.

df[df['SCORE'] > 5].groupby('NAME')['SCORE'].size()\
                   .reindex(df['NAME'].unique(), fill_value=0)

Output:

NAME
Bob    0
Sue    1
Tom    2
Max    1
Jay    1
Mel    3
Name: SCORE, dtype: int64
Scott Boston
  • 147,308
  • 15
  • 139
  • 187