Drop duplicates based on subset of columns keeping the rows with highest value in col E & if values equal in E the rows with highest value in col B

Question

Say I have below dataframe:

I would like to drop duplicates based on columns A, B and C, keeping the rows for which column E is the highest. And if the values in column E are the same, then keeping the rows for which the column D is the highest.

So above dataframe would become:

I saw a beginning of answer there: python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B but unfortunately I can't find out how to handle the if the values are the same in column E then keep the highest from column D :/

(I am running this code on a quite large dataset)

Any help appreciated !

If it is fast enough, you can sort the frame first: `df.sort_values(["E", "D"], ascending=[False, False]).drop_duplicates(subset=list("ABC"))`. But there might be a better solution as usual. — Mustafa Aydın, Jun 11 '21 at 13:45

score 1 · Accepted Answer · answered Jun 11 '21 at 16:28

1

you can sort the frame first according to the E, D criterion in descending order and then drop the duplicates:

df.sort_values(["E", "D"], ascending=[False, False]).drop_duplicates(subset=list("ABC"))

answered Jun 11 '21 at 16:28

Mustafa Aydın

17,645
4
15
38

Drop duplicates based on subset of columns keeping the rows with highest value in col E & if values equal in E the rows with highest value in col B

1 Answers1