46

I have this simple dataframe df:

df = pd.DataFrame({'c':[1,1,1,2,2,2,2],'type':['m','n','o','m','m','n','n']})

my goal is to count values of type for each c, and then add a column with the size of c. So starting with:

In [27]: g = df.groupby('c')['type'].value_counts().reset_index(name='t')

In [28]: g
Out[28]: 
   c type  t
0  1    m  1
1  1    n  1
2  1    o  1
3  2    m  2
4  2    n  2

the first problem is solved. Then I can also:

In [29]: a = df.groupby('c').size().reset_index(name='size')

In [30]: a
Out[30]: 
   c  size
0  1     3
1  2     4

How can I add the size column directly to the first dataframe? So far I used map as:

In [31]: a.index = a['c']

In [32]: g['size'] = g['c'].map(a['size'])

In [33]: g
Out[33]: 
   c type  t  size
0  1    m  1     3
1  1    n  1     3
2  1    o  1     3
3  2    m  2     4
4  2    n  2     4

which works, but is there a more straightforward way to do this?

Fabio Lamanna
  • 20,504
  • 24
  • 90
  • 122

3 Answers3

44

Use transform to add a column back to the orig df from a groupby aggregation, transform returns a Series with its index aligned to the orig df:

In [123]:
g = df.groupby('c')['type'].value_counts().reset_index(name='t')
g['size'] = df.groupby('c')['type'].transform('size')
g

Out[123]:
   c type  t  size
0  1    m  1     3
1  1    n  1     3
2  1    o  1     3
3  2    m  2     4
4  2    n  2     4
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Actually in this case we should change it to `g['size'] = g.groupby('c')['t'].transform('size')` since I want to keep the `value_counts()`. – Fabio Lamanna May 12 '16 at 14:31
  • 1
    this is a much better answer than: https://stackoverflow.com/questions/30244952/python-pandas-create-new-column-with-groupby-sum – vagabond Aug 11 '17 at 10:54
  • The link to transform has changed slightly, this is the new one: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#transformation – Nicolas Forstner Aug 13 '20 at 09:18
18

Another solution with transform len:

df['size'] = df.groupby('c')['type'].transform(len)
print df
   c type size
0  1    m    3
1  1    n    3
2  1    o    3
3  2    m    4
4  2    m    4
5  2    n    4
6  2    n    4

Another solution with Series.map and Series.value_counts:

df['size'] = df['c'].map(df['c'].value_counts())
print (df)
   c type  size
0  1    m     3
1  1    n     3
2  1    o     3
3  2    m     4
4  2    m     4
5  2    n     4
6  2    n     4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Could you please explain shortly why you removed the first part of your original answer? (I found it useful for my goal, which is different from the OP's, but the question's title describes it well (which is how i got here)) – Oren Milman Jun 07 '20 at 09:07
1

You can calculate the groupby object and use it multiple times:

g = df.groupby('c')['type']

df = g.value_counts().reset_index(name='counts')
df['size'] = g.transform('size')

or

g.value_counts().reset_index(name='counts').assign(size=g.transform('size'))

Output:

   c type  counts  size
0  1    m       1     3
1  1    n       1     3
2  1    o       1     3
3  2    m       2     4
4  2    n       2     4
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73