I have a DataFrame and I am using .aggregate({'col1': np.sum})
, this will perform a summation of the values in col1
and aggregate them together. Is it possible to perform a count, something like .aggregate({'col1': some count function here})
?
Asked
Active
Viewed 1.5k times
5

Nicolás Ozimica
- 9,481
- 5
- 38
- 51

Mike El Jackson
- 771
- 3
- 14
- 23
-
1`{'col1': 'count'}` or `{'col1': 'size'}` or `{'col1': 'nunique'}` depending on your use case. – root Jan 16 '17 at 17:53
-
Or `len` (the built-in), which I suggest is the most readable of the bunch. – Aleksey Bilogur Jan 16 '17 at 17:57
-
`len` is typically slower than `'size'`, as it's a python built-in instead of numpy under the hood. – root Jan 16 '17 at 18:06
1 Answers
17
You can use 'size'
, 'count'
, or 'nunique'
depending on your use case. The differences between them being:
'size'
: the count includingNaN
and repeat values.'count'
: the count excludingNaN
but including repeats.'nunique'
: the count of unique values, excluding repeats andNaN
.
For example, consider the following DataFrame:
df = pd.DataFrame({'col0': list('aabbcc'), 'col1': [1, 1, 2, np.nan, 3, 4]})
col0 col1
0 a 1.0
1 a 1.0
2 b 2.0
3 b NaN
4 c 3.0
5 c 4.0
Then using the three functions described:
df.groupby('col0')['col1'].agg(['size', 'count', 'nunique'])
size count nunique
col0
a 2 2 1
b 2 1 1
c 2 2 2

root
- 32,715
- 6
- 74
- 87