1

Hi I am droping duplicate from dataframe based on one column i.e "ID", Till now i am droping the duplicate and keeping the first occurence but I want to keep the first(top) two occurrence instead of only one. So I can compare the values of first two rows of another column "similarity_score".

data_2 = data.sort_values('similarity_score' , ascending = False)

data_2.drop_duplicates(subset=['ID'], keep='first').reset_index()

sandy
  • 43
  • 4

1 Answers1

0

Let us sort the values then do groupby + head

data.sort_values('similarity', ascending=False).groupby('ID').head(2)

Alternatively, you can use groupby + nlargest which will also give you the desired result:

data.groupby('ID')['similarity'].nlargest(2).droplevel(1)
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53