Drop rows where column duplicate and and other column null

Question

Seeking to drop rows where col1 is duplicate and col2 is null, but only when both conditions are met.

Therefore, where col1 is duplicate and col2 is not null, row should not be dropped.

d = {'col1': ['A1', 'B4', 'A2', 'A1', 'B4', 'B4'], 'col2': [np.nan, 'ref4', np.nan, 'ref3', 'ref1', 'ref3']}
df = pd.DataFrame(data=d)

  col1  col2
0   A1   NaN
1   B4  ref4
2   A2   NaN
3   A1  ref3
4   B4  ref1
5   B4  ref3

index row 0 satisfies both conditions and would therefore be the only row dropped

Output:

  col1  col2
1   B4  ref4
2   A2   NaN
3   A1  ref3
4   B4  ref1
5   B4  ref3

I have tried the following code but it does not perform as needed.....

m1 = df['col2'].notna()
m2 = df['col1'].duplicated()

df = df[m1 & m2]

print(df)

Maybe: `df[df['col1'].duplicated(keep=False) ^ df['col2'].isna()]` ? — Jon Clements, Jun 07 '22 at 19:35
That seems to have done the trick. Thank you. By the way what is the significance of the symbol '^' ? — Diop Chopra, Jun 07 '22 at 19:39

score 1 · Answer 1 · answered Jun 07 '22 at 19:50

1

Create another dataframe that drops rows that meet that condition. df_2 = df[(~df['col'].duplicated())&(df['col2'].isnull())].

answered Jun 07 '22 at 19:50

Ted

41
4

Souhaib · Answer 2 · 2022-06-07T20:38:12.473

@Jon_Clements' comment is nice, but there's a problem with it. Using only the XOR operator in this case will result in removing rows where col1 is not duplicate and col2 is not null at the same time.

Here's an example of this

  col1  col2
0   A1  NaN
1   B4  ref4
2   A2  NaN
3   A1  ref3
4   B4  ref1
5   B4  ref3
6   B5  ref5

To solve this use the following snippet:

df[df['col1'].duplicated(keep=False) ^ df['col2'].isna() | ~df['col1'].duplicated(keep=False) & ~df['col2'].isna()]

Results:


  col1  col2
1   B4  ref4
2   A2  NaN
3   A1  ref3
4   B4  ref1
5   B4  ref3
6   B5  ref5

Drop rows where column duplicate and and other column null

2 Answers2