pandas add column to groupby dataframe

Question

I have this simple dataframe df:

df = pd.DataFrame({'c':[1,1,1,2,2,2,2],'type':['m','n','o','m','m','n','n']})

my goal is to count values of type for each c, and then add a column with the size of c. So starting with:

In [27]: g = df.groupby('c')['type'].value_counts().reset_index(name='t')

In [28]: g
Out[28]: 
   c type  t
0  1    m  1
1  1    n  1
2  1    o  1
3  2    m  2
4  2    n  2

the first problem is solved. Then I can also:

In [29]: a = df.groupby('c').size().reset_index(name='size')

In [30]: a
Out[30]: 
   c  size
0  1     3
1  2     4

How can I add the size column directly to the first dataframe? So far I used map as:

In [31]: a.index = a['c']

In [32]: g['size'] = g['c'].map(a['size'])

In [33]: g
Out[33]: 
   c type  t  size
0  1    m  1     3
1  1    n  1     3
2  1    o  1     3
3  2    m  2     4
4  2    n  2     4

which works, but is there a more straightforward way to do this?

EdChum · Accepted Answer · 2016-05-12T14:33:33.260

44

Use transform to add a column back to the orig df from a groupby aggregation, transform returns a Series with its index aligned to the orig df:

In [123]:
g = df.groupby('c')['type'].value_counts().reset_index(name='t')
g['size'] = df.groupby('c')['type'].transform('size')
g

Out[123]:
   c type  t  size
0  1    m  1     3
1  1    n  1     3
2  1    o  1     3
3  2    m  2     4
4  2    n  2     4

edited May 12 '16 at 14:33

answered May 12 '16 at 14:27

EdChum

376,765
198
813
562

Actually in this case we should change it to `g['size'] = g.groupby('c')['t'].transform('size')` since I want to keep the `value_counts()`. – Fabio Lamanna May 12 '16 at 14:31
1

this is a much better answer than: https://stackoverflow.com/questions/30244952/python-pandas-create-new-column-with-groupby-sum – vagabond Aug 11 '17 at 10:54
The link to transform has changed slightly, this is the new one: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#transformation – Nicolas Forstner Aug 13 '20 at 09:18

jezrael · Answer 2 · 2020-06-07T11:33:30.713

18

Another solution with transform len:

df['size'] = df.groupby('c')['type'].transform(len)
print df
   c type size
0  1    m    3
1  1    n    3
2  1    o    3
3  2    m    4
4  2    m    4
5  2    n    4
6  2    n    4

Another solution with Series.map and Series.value_counts:

df['size'] = df['c'].map(df['c'].value_counts())
print (df)
   c type  size
0  1    m     3
1  1    n     3
2  1    o     3
3  2    m     4
4  2    m     4
5  2    n     4
6  2    n     4

edited Jun 07 '20 at 11:33

answered May 12 '16 at 14:29

jezrael

822,522
95
1,334
1,252

Could you please explain shortly why you removed the first part of your original answer? (I found it useful for my goal, which is different from the OP's, but the question's title describes it well (which is how i got here)) – Oren Milman Jun 07 '20 at 09:07

Mykola Zotko · Answer 3 · 2022-01-23T10:25:44.637

1

You can calculate the groupby object and use it multiple times:

g = df.groupby('c')['type']

df = g.value_counts().reset_index(name='counts')
df['size'] = g.transform('size')

or

g.value_counts().reset_index(name='counts').assign(size=g.transform('size'))

Output:

   c type  counts  size
0  1    m       1     3
1  1    n       1     3
2  1    o       1     3
3  2    m       2     4
4  2    n       2     4

edited Jan 23 '22 at 10:25

answered Jan 22 '22 at 16:56

Mykola Zotko

15,583
3
71
73

pandas add column to groupby dataframe

3 Answers3

Linked

Related