Duplicate values pandas

Question

I am new to pandas. I have been trying to solve a problem here

This is the problem statement where I want to drop any row where I have a duplicate A but non duplicate B

Here is the kind of output I want

score 1 · Answer 1 · answered Sep 18 '19 at 12:33

1

IIUC, this is what you need

a = (df['A'].ne(df['A'].shift())).ne((df['B'].ne(df['B'].shift())))
df[~a].reset_index(drop=True)

Output

answered Sep 18 '19 at 12:33

moys

7,747
2
11
42

this is not the expected output – ansev Sep 18 '19 at 12:43
@Krishan, can you clarify? based on your description, my code keeps the first row, however, your picture seems to keep the 2nd row. Can you clarify which is correct? – moys Sep 18 '19 at 12:51

score 1 · Answer 2 · answered Sep 18 '19 at 12:42

1

I think you need:

cond=(df.eq(df.shift(-1))|df.eq(df.shift())).all(axis=1)
pd.concat([df[~cond].groupby('A').last().reset_index(),df[cond]])

    A   B
0   2   y
2   3   x
3   3   x

answered Sep 18 '19 at 12:42

ansev

30,322
5
17
31

Duplicate values pandas

2 Answers2