Pandas, most secure way to drop duplicates based on two labels, keeping the row with highest third label score

Asked Nov 07 '20 at 22:37

Active Nov 07 '20 at 22:47

Viewed 25 times

Say that my dataframe looks like this

score label1 label2
.9923   a      b
.3924   a      b
.2923   a      c

I want to remove all duplicates based on label1 and label2, keeping the highest value from score.

So result would look like

score label1 label2
.9923   a      b
.2923   a      c

I came up with

df.sort_values(by=['Score'], ascending=False).drop_duplicates(subset=['label1', 'label2'], keep='first')

It's meant to organize the data by score, and then drop duplicates, keeping the first that comes up in the dataframe.

It seems to work, though I am not sure if this is the proper way to do it; will this work for all edge cases?

edited Nov 07 '20 at 22:47

asked Nov 07 '20 at 22:37

SantoshGupta7

2

Your solution does reliably solve your question. – Michael Szczesny Nov 07 '20 at 22:43
Which version of python are you using in this? – Nov 07 '20 at 22:51
I am using python 3.6 – SantoshGupta7 Nov 07 '20 at 22:51
yes it answers it exactly – SantoshGupta7 Nov 07 '20 at 23:02

0 Answers0