0

Say that my dataframe looks like this

score label1 label2
.9923   a      b
.3924   a      b
.2923   a      c

I want to remove all duplicates based on label1 and label2, keeping the highest value from score.

So result would look like

score label1 label2
.9923   a      b
.2923   a      c

I came up with

df.sort_values(by=['Score'], ascending=False).drop_duplicates(subset=['label1', 'label2'], keep='first')

It's meant to organize the data by score, and then drop duplicates, keeping the first that comes up in the dataframe.

It seems to work, though I am not sure if this is the proper way to do it; will this work for all edge cases?

SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116

0 Answers0