2

I have three columns which represent my data. I am trying to update the final column 'Val' based on the input of the first two.

I want the maximum of the 'Range', categorised by the 'Cat' column. Following that, I would like to update the 'Val' column based on the minimum value of the 'Val' column in that group.

Input
    Cat Range Val
0    1    0   1.0
2    1    2   1.5
3    1    3   2.0
5    1    5   9.0
6    2    0   1.5
7    2    5   2.0
8    2   10   0.5
9    2   15   2.8
10   2   20   9.0 

Desired Output (Only Lines 5 and 10 change):
    Cat Range Val
0    1    0   1.0
2    1    2   1.5
3    1    3   2.0
5    1    5   1.0
6    2    0   1.5
7    2    5   2.0
8    2   10   0.5
9    2   15   2.8
10   2   20   0.5 

My elementary knowledge of pandas suggested this approach, however it does not work and I cannot seem to resolve it.

df.loc[df.groupby(['Cat'])['Range'].max(), 'Val'] = df.groupby('Cat')['Val'].min()
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
itsgupta
  • 123
  • 2

3 Answers3

2

You can use lambda function with numpy.where if need compare by Val column:

f = lambda x: np.where(x == x.max(), x.min(), x)
df['Val'] = df.groupby(['Cat'])['Val'].transform(f)
print (df)
    Cat  Range  Val
0     1      0  1.0
2     1      2  1.5
3     1      3  2.0
5     1      5  1.0
6     2      0  1.5
7     2      5  2.0
8     2     10  0.5
9     2     15  2.8
10    2     20  0.5

Use if need compare by max in Range column use GroupBy.transform if need replace all max values per groups:

m = df['Range'].eq(df.groupby(['Cat'])['Range'].transform('max'))
df.loc[m, 'Val'] = df.groupby('Cat')['Val'].transform('min')

print (df)
    Cat  Range  Val
0     1      0  1.0
2     1      2  1.5
3     1      3  2.0
5     1      5  1.0
6     2      0  1.5
7     2      5  2.0
8     2     10  0.5
9     2     15  2.8
10    2     20  0.5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

If you need to replace only first max value of Range per group you can use df.loc and DataFrameGroupBy.idxmax :

In [3545]: (df.loc[df.groupby('Cat')['Range'].idxmax(), 'Val'] = 
            df.groupby('Cat')['Val'].transform('min'))

In [3546]: df
Out[3546]: 
    Cat  Range  Val
0     1      0  1.0
2     1      2  1.5
3     1      3  2.0
5     1      5  1.0
6     2      0  1.5
7     2      5  2.0
8     2     10  0.5
9     2     15  2.8
10    2     20  0.5
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
0
df.merge(df.groupby(['Cat'])['Range'].max().reset_index(), how='inner').groupby('Cat')['Val'].min()

Is this what you're looking for?

scsanty
  • 146
  • 1
  • 7