Pandas Dataframe get maximum with respect to other entries

Question

I have a Dataframe like this:

name	phase	value
BOB	1	.9
BOB	2	.05
BOB	3	.05
JOHN	2	.45
JOHN	3	.45
JOHN	4	.05
FRANK	1	.4
FRANK	3	.6

I want to find which entry in column 'phase' has the maximum value in column 'value'.
If more than one share the same maximum value keep the first or a random value for 'phase'.

Desired result table:

name	phase	value
BOB	1	.9
JOHN	2	.45
FRANK	3	.6

my approach was:

df.groupby(['name'])[['phase','value']].max()

but it returned incorrect values.

Corralien · Accepted Answer · 2022-04-23T23:11:45.413

4

You don't need to use groupby. Sort values by value and phase (adjust the order if necessary) and drop duplicates by name:

out = (df.sort_values(['value', 'phase'], ascending=[False, True])
         .drop_duplicates('name')
         .sort_index(ignore_index=True))
print(out)

# Output
    name  phase  value
0    BOB      1   0.90
1   JOHN      2   0.45
2  FRANK      3   0.60

edited Apr 23 '22 at 23:11

answered Apr 23 '22 at 23:00

Corralien

109,409
8
28
52

Andrej Kesely · Answer 2 · 2022-04-23T23:02:14.110

3

Try to sort the dataframe first:

df = df.sort_values(
    by=["name", "value", "phase"], ascending=[True, False, True]
)

x = df.groupby("name", as_index=False).first()
print(x)

Prints:

    name  phase  value
0    BOB      1   0.90
1  FRANK      1   0.60
2   JOHN      1   0.45

edited Apr 23 '22 at 23:02

answered Apr 23 '22 at 22:57

Andrej Kesely

168,389
15
48
91

1

Better than `apply('first')` now :) – Corralien Apr 23 '22 at 23:04
@Corralien Yeah, I must shake off my `.apply` habit :D But the `.drop_duplicates` solution is better than this, IMHO – Andrej Kesely Apr 23 '22 at 23:05

score 1 · Answer 3 · answered Apr 23 '22 at 23:16

1

A possible solution, that could avoid sorting is with groupby:

df.loc[df.groupby('name', sort = False).value.idxmax()]

    name  phase  value
0    BOB      1   0.90
3   JOHN      2   0.45
7  FRANK      3   0.60

answered Apr 23 '22 at 23:16

sammywemmy

27,093
4
17
31

score 1 · Answer 4 · answered Apr 23 '22 at 23:59

1

You may check

out = df.sort_values('value',ascending=False).drop_duplicates('name').sort_index()
Out[434]: 
    name  phase  value
0    BOB      1   0.90
3   JOHN      2   0.45
7  FRANK      3   0.60

answered Apr 23 '22 at 23:59

BENY

317,841
20
164
234

Pandas Dataframe get maximum with respect to other entries

4 Answers4