Want to drop duplicate based on one column but want to keep first two rows

Question

Hi I am droping duplicate from dataframe based on one column i.e "ID", Till now i am droping the duplicate and keeping the first occurence but I want to keep the first(top) two occurrence instead of only one. So I can compare the values of first two rows of another column "similarity_score".

data_2 = data.sort_values('similarity_score' , ascending = False)

data_2.drop_duplicates(subset=['ID'], keep='first').reset_index()

score 0 · Answer 1 · answered May 12 '22 at 12:11

Let us sort the values then do groupby + head

data.sort_values('similarity', ascending=False).groupby('ID').head(2)

Alternatively, you can use groupby + nlargest which will also give you the desired result:

data.groupby('ID')['similarity'].nlargest(2).droplevel(1)

Want to drop duplicate based on one column but want to keep first two rows

1 Answers1