Update Dataframe Column based on Grouping and Conditionals

Question

I have three columns which represent my data. I am trying to update the final column 'Val' based on the input of the first two.

I want the maximum of the 'Range', categorised by the 'Cat' column. Following that, I would like to update the 'Val' column based on the minimum value of the 'Val' column in that group.

Input
    Cat Range Val
0    1    0   1.0
2    1    2   1.5
3    1    3   2.0
5    1    5   9.0
6    2    0   1.5
7    2    5   2.0
8    2   10   0.5
9    2   15   2.8
10   2   20   9.0 

Desired Output (Only Lines 5 and 10 change):
    Cat Range Val
0    1    0   1.0
2    1    2   1.5
3    1    3   2.0
5    1    5   1.0
6    2    0   1.5
7    2    5   2.0
8    2   10   0.5
9    2   15   2.8
10   2   20   0.5

My elementary knowledge of pandas suggested this approach, however it does not work and I cannot seem to resolve it.

df.loc[df.groupby(['Cat'])['Range'].max(), 'Val'] = df.groupby('Cat')['Val'].min()

jezrael · Accepted Answer · 2020-11-25T07:56:30.203

You can use lambda function with numpy.where if need compare by Val column:

f = lambda x: np.where(x == x.max(), x.min(), x)
df['Val'] = df.groupby(['Cat'])['Val'].transform(f)
print (df)
    Cat  Range  Val
0     1      0  1.0
2     1      2  1.5
3     1      3  2.0
5     1      5  1.0
6     2      0  1.5
7     2      5  2.0
8     2     10  0.5
9     2     15  2.8
10    2     20  0.5

Use if need compare by max in Range column use GroupBy.transform if need replace all max values per groups:

m = df['Range'].eq(df.groupby(['Cat'])['Range'].transform('max'))
df.loc[m, 'Val'] = df.groupby('Cat')['Val'].transform('min')

print (df)
    Cat  Range  Val
0     1      0  1.0
2     1      2  1.5
3     1      3  2.0
5     1      5  1.0
6     2      0  1.5
7     2      5  2.0
8     2     10  0.5
9     2     15  2.8
10    2     20  0.5

Mayank Porwal · Answer 2 · 2020-11-25T07:56:00.460

If you need to replace only first max value of Range per group you can use df.loc and DataFrameGroupBy.idxmax :

In [3545]: (df.loc[df.groupby('Cat')['Range'].idxmax(), 'Val'] = 
            df.groupby('Cat')['Val'].transform('min'))

In [3546]: df
Out[3546]: 
    Cat  Range  Val
0     1      0  1.0
2     1      2  1.5
3     1      3  2.0
5     1      5  1.0
6     2      0  1.5
7     2      5  2.0
8     2     10  0.5
9     2     15  2.8
10    2     20  0.5

score 0 · Answer 3 · answered Nov 25 '20 at 07:52

0

df.merge(df.groupby(['Cat'])['Range'].max().reset_index(), how='inner').groupby('Cat')['Val'].min()

Is this what you're looking for?

answered Nov 25 '20 at 07:52

scsanty

146
1
7

Update Dataframe Column based on Grouping and Conditionals

3 Answers3