1

I have pandas dataframe like this (10 columns and thousands of rows):

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     a    d    b    f    a    c    g     b
     b    g     i    k    l    r    k    b    m     l

It has duplicates. i want to replace duplicates with (0) across the rows and keep only the first occurrence. So, the result would be like this:

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     0    d    b    f    0    0    g     0
     0    0     i    k    l    r    0    0    m     0

The values in the dataframe have type "object". I am using python 3.5.

I found a solution in Replace duplicate values across columns in Pandas but it didn't work for me. No replacements happened.

Any other ideas to solve this issue?

Taie
  • 1,021
  • 16
  • 29

2 Answers2

3

You can stack to get a series, check that series for duplicated, unstack the series and mask:

df.mask(df.stack().duplicated().unstack(),0)

output:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    0    0    i    k    l    r    0    0    m     0
halfer
  • 19,824
  • 17
  • 99
  • 186
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

Why didn't this work?

is_duplicate = df.apply(pd.Series.duplicated, axis=1)
print(df.where(~is_duplicate, 0))

Works fine for me:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    b    g    i    k    l    r    0    0    m     0
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • because then `duplicated` is applied on each row separately. That is, the first `b` in second row is not a duplicate, which is not OP wants. – Quang Hoang Apr 27 '20 at 20:09