in column stage
I have 4 values :
I Have duplicates rows in this dataframe, and I wanted to drop them, for example:
I want to keep row #8015
and I don't have 2 rows with the same stage
and the same tweet_id
, for example:
I tried this solution:
twitter_archive = twitter_rchive.sort_values(by='stage', ascending=False).drop_duplicates(subset='tweet_id', keep='first').sort_index().reset_index(drop=True)
which I find it in this solution, But then I've lost 10 doggo
although I sorted my values and keeped the First occurance.