how to remove the duplicate value in columns pandas?

Question

Here is my data frame, I have split into the new columns.

DocID       0   1    2   3   4    5

CAT123     CAT  1   12 123  123  123
DOG14567   DOG  1   14 145 1456 14567
BIRD32     BIRD 3   32  32   32   32

and I would like to delete the duplicated value and see the results like this

    DocID       0   1    2   3   4    5

    CAT123     CAT  1   12 123  Nan  Nan
    DOG14567   DOG  1   14 145 1456 14567
    BIRD32     BIRD 3   32  Nan Nan  Nan

how can I do this, I know only drop row or columns. thank you in advance

score 4 · Answer 1 · answered Dec 18 '18 at 03:28

Using duplicated + mask

df = df.mask(df.apply(pd.Series.duplicated,1))
df
Out[8]: 
      DocID     0  1   2      3       4        5
0    CAT123   CAT  1  12  123.0     NaN      NaN
1  DOG14567   DOG  1  14  145.0  1456.0  14567.0
2    BIRD32  BIRD  3  32    NaN     NaN      NaN

score 1 · Answer 2 · answered Dec 18 '18 at 03:55

1

Just two line answer:

 new_df = df.apply(pd.Series.duplicated, axis=1)
 df.where(~new_df, np.nan)

answered Dec 18 '18 at 03:55

Karn Kumar

8,518
3
27
53

Alessandro Solbiati · Answer 3 · 2018-12-18T04:59:51.177

0

You can select the columns with duplicates with df[df.iloc[:,i + 1] - d.iloc[:,i] == 0], where i iterates over columns

for i in [4, 3, 2, 1]:
    df.iloc[:,i+1][df.iloc[:, i + 1] - df.iloc[:, i] == 0] = np.nan
df
      0  1   2      3       4        5
0   CAT  1  12  123.0     NaN      NaN
1   DOG  2  14  145.0  1456.0  14567.0
2  BIRD  3  32    NaN     NaN      NaN

edited Dec 18 '18 at 04:59

answered Dec 18 '18 at 03:50

Alessandro Solbiati

969
11
23

how to remove the duplicate value in columns pandas?

3 Answers3