3

Here is my data frame, I have split into the new columns.

DocID       0   1    2   3   4    5

CAT123     CAT  1   12 123  123  123
DOG14567   DOG  1   14 145 1456 14567
BIRD32     BIRD 3   32  32   32   32

and I would like to delete the duplicated value and see the results like this

    DocID       0   1    2   3   4    5

    CAT123     CAT  1   12 123  Nan  Nan
    DOG14567   DOG  1   14 145 1456 14567
    BIRD32     BIRD 3   32  Nan Nan  Nan

how can I do this, I know only drop row or columns. thank you in advance

Hook Im
  • 293
  • 3
  • 11

3 Answers3

4

Using duplicated + mask

df = df.mask(df.apply(pd.Series.duplicated,1))
df
Out[8]: 
      DocID     0  1   2      3       4        5
0    CAT123   CAT  1  12  123.0     NaN      NaN
1  DOG14567   DOG  1  14  145.0  1456.0  14567.0
2    BIRD32  BIRD  3  32    NaN     NaN      NaN
BENY
  • 317,841
  • 20
  • 164
  • 234
1

Just two line answer:

 new_df = df.apply(pd.Series.duplicated, axis=1)
 df.where(~new_df, np.nan)
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53
0

You can select the columns with duplicates with df[df.iloc[:,i + 1] - d.iloc[:,i] == 0], where i iterates over columns

for i in [4, 3, 2, 1]:
    df.iloc[:,i+1][df.iloc[:, i + 1] - df.iloc[:, i] == 0] = np.nan
df
      0  1   2      3       4        5
0   CAT  1  12  123.0     NaN      NaN
1   DOG  2  14  145.0  1456.0  14567.0
2  BIRD  3  32    NaN     NaN      NaN