Say that my dataframe looks like this
score label1 label2
.9923 a b
.3924 a b
.2923 a c
I want to remove all duplicates based on label1 and label2, keeping the highest value from score.
So result would look like
score label1 label2
.9923 a b
.2923 a c
I came up with
df.sort_values(by=['Score'], ascending=False).drop_duplicates(subset=['label1', 'label2'], keep='first')
It's meant to organize the data by score, and then drop duplicates, keeping the first that comes up in the dataframe.
It seems to work, though I am not sure if this is the proper way to do it; will this work for all edge cases?