2

Suppose I have a dateframe like this:

    A      B      C       
0  foo    one     1
1  bar    one     2
2  foo    two     1
3  bar  three     2
4  foo    two     3
5  bar    two     5
6  foo    one     2
7  foo  three     5
8  bar    one     4

I want to group by 'B' and do normalization on 'C' column foreach particular 'B'. I'd like to do a simple min-max norm like x / max(x)

grouped_b = df.groupby('B')

def norm(value):
    return value/value.max()

norm_B = grouped_b['C'].agg(norm)

The result would looks like this:

    A      B      C       
0  foo    one    0.25
1  bar    one    0.5
2  foo    two    0.2
3  bar  three    0.25
4  foo    two    0.6
5  bar    two     1
6  foo    one    0.5
7  foo  three     1
8  bar    one     1
ZHICHEN GUO
  • 100
  • 5

2 Answers2

2

Use GroupBy.transform for return Series with same size like original df:

grouped_b = df.groupby('B')

def norm(value):
    return value/value.max()

df['C'] = grouped_b['C'].transform(norm)

print (df)
     A      B     C
0  foo    one  0.25
1  bar    one  0.50
2  foo    two  0.20
3  bar  three  0.40
4  foo    two  0.60
5  bar    two  1.00
6  foo    one  0.50
7  foo  three  1.00
8  bar    one  1.00

You can also use lambda function:

df['C'] = df.groupby('B')['C'].transform(lambda x: x / x.max())
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Using transform

df.C/=df.groupby(['B']).C.transform('max')
BENY
  • 317,841
  • 20
  • 164
  • 234