DataFrame: add column with the size of a group

Question

I have the following dataframe:

    fsq digits  digits_type
0    1   1       odd
1    2   1       odd
2    3   1       odd
3    11  2       even
4    22  2       even
5    101 3       odd
6    111 3       odd

and I want to add a last column, count, containing the number of fsq belonging to the digits group, i.e:

    fsq digits  digits_type   count
0    1   1       odd          3
1    2   1       odd          3
2    3   1       odd          3
3    11  2       even         2
4    22  2       even         2
5    101 3       odd          2
6    111 3       odd          2

Since there are 3 fsq rows that has digits equal to 1, 2 fsq rows that has digits equal to 2, etc.

TomAugspurger · Accepted Answer · 2014-04-11T20:27:42.307

22

In [395]: df['count'] = df.groupby('digits')['fsq'].transform(len)

In [396]: df
Out[396]: 
   fsq  digits digits_type  count
0    1       1         odd      3
1    2       1         odd      3
2    3       1         odd      3
3   11       2        even      2
4   22       2        even      2
5  101       3         odd      2
6  111       3         odd      2

[7 rows x 4 columns]

edited Apr 11 '14 at 20:27

answered Apr 11 '14 at 17:26

TomAugspurger

28,234
8
86
69

small question: how do you paste iPython input/output into the Stackoverflow code format? – luffe Apr 11 '14 at 17:31
Thanks DSM. I just copied from the terminal, pasted here. Then select the text and hit CTRL-k to format it as code. – TomAugspurger Apr 11 '14 at 20:27

score 8 · Answer 2 · answered Jul 13 '18 at 09:48

In general, you should use Pandas-defined methods, where possible. This will often be more efficient.

In this case you can use 'size', in the same vein as df.groupby('digits')['fsq'].size():

df = pd.concat([df]*10000)

%timeit df.groupby('digits')['fsq'].transform('size')  # 3.44 ms per loop
%timeit df.groupby('digits')['fsq'].transform(len)     # 11.6 ms per loop

DataFrame: add column with the size of a group

2 Answers2

Linked

Related