Replace duplicated values across different columns in pandas with '0'

Question

I have pandas dataframe like this (10 columns and thousands of rows):

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     a    d    b    f    a    c    g     b
     b    g     i    k    l    r    k    b    m     l

It has duplicates. i want to replace duplicates with (0) across the rows and keep only the first occurrence. So, the result would be like this:

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     0    d    b    f    0    0    g     0
     0    0     i    k    l    r    0    0    m     0

The values in the dataframe have type "object". I am using python 3.5.

I found a solution in Replace duplicate values across columns in Pandas but it didn't work for me. No replacements happened.

Any other ideas to solve this issue?

score 3 · Answer 1 · edited Nov 10 '21 at 22:36

3

You can stack to get a series, check that series for duplicated, unstack the series and mask:

df.mask(df.stack().duplicated().unstack(),0)

output:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    0    0    i    k    l    r    0    0    m     0

edited Nov 10 '21 at 22:36

halfer

19,824
17
99
186

answered Apr 27 '20 at 19:57

Quang Hoang

146,074
10
56
74

score 0 · Answer 2 · answered Apr 27 '20 at 20:01

0

Why didn't this work?

is_duplicate = df.apply(pd.Series.duplicated, axis=1)
print(df.where(~is_duplicate, 0))

Works fine for me:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    b    g    i    k    l    r    0    0    m     0

answered Apr 27 '20 at 20:01

NYC Coder

7,424
2
11
24

because then `duplicated` is applied on each row separately. That is, the first `b` in second row is not a duplicate, which is not OP wants. – Quang Hoang Apr 27 '20 at 20:09

Replace duplicated values across different columns in pandas with '0'

2 Answers2

Linked