How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas?

Question

I have DataFrame in Python Pandas like below:

data types:

And I need to drop duplicates in above DataFrame, so as to:

If value in ID in my DF is duplicated -> drop rows where TYPE = B and TG_A = 1 or TYPE = A and TG_B = 1

So, as a result I need something like below:

ID  | TYPE | TG_A | TG_B
----|------|------|-----
111 | A    | 1    | 0
222 | A    | 1    | 0
333 | B    | 0    | 1

How can I do that in Python Pandas ?

score 2 · Accepted Answer · answered Dec 12 '22 at 15:04

You can use two boolean masks and groupby.idxmax to get the first non matching value:

m1 = df['TYPE'].eq('B') & df['TG_A'].eq(1)
m2 = df['TYPE'].eq('A') & df['TG_B'].eq(1)

out = df.loc[(~(m1|m2)).groupby(df['ID']).idxmax()]

Output:

    ID TYPE  TG_A  TG_B
0  111    A     1     0
3  222    A     1     0
4  333    B     0     1

score 1 · Answer 2 · answered Dec 12 '22 at 15:09

1

df[df['TYPE'].eq('A').eq(df['TG_A'])]

result

    ID  TYPE    TG_A    TG_B
0   111 A       1       0
3   222 A       1       0
4   333 B       0       1

answered Dec 12 '22 at 15:09

Panda Kim

1

This gives the provided output but doesn't really follow the logic: "*I need to drop duplicates*"/"*If value in ID in my DF is duplicated…*", this would keep duplicates and remove non-duplicated non-matches – mozway Dec 12 '22 at 15:10
Simply, for a given problem There doesn't seem to be a very strict logic. if [666 B 1 0], [666 A 0 1] exist, remove what? How should I handle it? I just gave that answer because it could be a dataset situation where such a situation is not needed like example. There are always cases in which the questioner cannot express accurately. I don't see enough logic to say that it doesn't fit the logic. At that time, i throw simplest way to solve example. – Panda Kim Dec 12 '22 at 15:30

2 Answers2