Seeking to drop rows where col1
is duplicate and col2
is null
, but only when both conditions are met.
Therefore, where col1
is duplicate and col2
is not null
, row should not be dropped.
d = {'col1': ['A1', 'B4', 'A2', 'A1', 'B4', 'B4'], 'col2': [np.nan, 'ref4', np.nan, 'ref3', 'ref1', 'ref3']}
df = pd.DataFrame(data=d)
col1 col2
0 A1 NaN
1 B4 ref4
2 A2 NaN
3 A1 ref3
4 B4 ref1
5 B4 ref3
index row 0
satisfies both conditions and would therefore be the only row dropped
Output:
col1 col2
1 B4 ref4
2 A2 NaN
3 A1 ref3
4 B4 ref1
5 B4 ref3
I have tried the following code but it does not perform as needed.....
m1 = df['col2'].notna()
m2 = df['col1'].duplicated()
df = df[m1 & m2]
print(df)