1

Seeking to drop rows where col1 is duplicate and col2 is null, but only when both conditions are met.

Therefore, where col1 is duplicate and col2 is not null, row should not be dropped.

d = {'col1': ['A1', 'B4', 'A2', 'A1', 'B4', 'B4'], 'col2': [np.nan, 'ref4', np.nan, 'ref3', 'ref1', 'ref3']}
df = pd.DataFrame(data=d)

  col1  col2
0   A1   NaN
1   B4  ref4
2   A2   NaN
3   A1  ref3
4   B4  ref1
5   B4  ref3

index row 0 satisfies both conditions and would therefore be the only row dropped

Output:

  col1  col2
1   B4  ref4
2   A2   NaN
3   A1  ref3
4   B4  ref1
5   B4  ref3

I have tried the following code but it does not perform as needed.....

m1 = df['col2'].notna()
m2 = df['col1'].duplicated()

df = df[m1 & m2]

print(df)
Diop Chopra
  • 319
  • 3
  • 10

2 Answers2

1

Create another dataframe that drops rows that meet that condition. df_2 = df[(~df['col'].duplicated())&(df['col2'].isnull())].

Ted
  • 41
  • 4
0

@Jon_Clements' comment is nice, but there's a problem with it. Using only the XOR operator in this case will result in removing rows where col1 is not duplicate and col2 is not null at the same time.

Here's an example of this

  col1  col2
0   A1  NaN
1   B4  ref4
2   A2  NaN
3   A1  ref3
4   B4  ref1
5   B4  ref3
6   B5  ref5

To solve this use the following snippet:

df[df['col1'].duplicated(keep=False) ^ df['col2'].isna() | ~df['col1'].duplicated(keep=False) & ~df['col2'].isna()]

Results:


  col1  col2
1   B4  ref4
2   A2  NaN
3   A1  ref3
4   B4  ref1
5   B4  ref3
6   B5  ref5
Souhaib
  • 119
  • 1
  • 6