Say I have below dataframe:
A B C D E
3 2 1 4 5
3 2 1 2 3
4 5 6 7 8
4 5 6 9 8
9 3 8 5 4
I would like to drop duplicates based on columns A, B and C, keeping the rows for which column E is the highest. And if the values in column E are the same, then keeping the rows for which the column D is the highest.
So above dataframe would become:
A B C D E
3 2 1 4 5
4 5 6 9 8
9 3 8 5 4
I saw a beginning of answer there: python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B but unfortunately I can't find out how to handle the if the values are the same in column E then keep the highest from column D :/
(I am running this code on a quite large dataset)
Any help appreciated !