I have a dataset with 3 target classes: ‘Yes’, ‘Maybe’, and ‘No’.
Unique_id target
111 Yes
111 Maybe
111 No
112 No
112 Maybe
113 No
I want to drop duplicate rows based on unique_id. But ‘drop duplicates’ generally keeps the first or last row, and I want to keep the rows based on following the conditions:
1) If unique_id has all the 3 classes (Yes, Maybe and No), we’ll keep only the ‘Yes’ class.
2) If unique_id has the 2 classes (Maybe and No), we’ll keep only the ‘Maybe’ class.
3) We’ll keep the ‘No’ class when only ‘No’ will be there.
I tried ‘sort_values’ the target class (Yes=1, Maybe=2, No=3) and then dropped the duplicates.
Desired output:
Unique_id target
111 Yes
112 Maybe
113 No
I’m thinking if there are better ways to do that.
Any suggestions would be appreciated. Thanks!