3

I have four a data frame as follows:

Proxyid A B C D
123 1 0 0 0
456 1 1 1 1
789 0 0 0 0

This is the idea of the data frame. now I want to duplicate the rows where there are more than one 1. and assign values as follows.

Proxyid A B C D
123 1 0 0 0
456 1 0 0 0
456 0 1 0 0
456 0 0 1 0
456 0 0 0 1
789 0 0 0 0

I would really appreciate any input. Thank you.

Yesha Shah
  • 31
  • 1

1 Answers1

1

One option via pd.get_dumies:

df1 = (
    pd.get_dummies(
        df.set_index('Proxyid')
        .mul(df.columns[1:])
        .replace('', np.NAN)
        .stack()
    )
    .reset_index().drop('level_1', 1)
) 
result = df1.append(df[~df.Proxyid.isin(df1.Proxyid)])

OUTPUT:

Proxyid A B C D
0 123 1 0 0 0
1 456 1 0 0 0
2 456 0 1 0 0
3 456 0 0 1 0
4 456 0 0 0 1
2 789 0 0 0 0

If you've extra columns just add them in set_index and use:

df1 = df.set_index(['Proxyid', 'test'])
df1 = pd.get_dummies(df1.mul(df1.columns).replace('', np.NAN).stack()).reset_index()
result = df1.append(df[~df.Proxyid.isin(df1.Proxyid)])
Nk03
  • 14,699
  • 2
  • 8
  • 22
  • 1
    I wanted to add what if I have more than those 5 columns? and I just want to operate on those mentioned columns? – Yesha Shah May 24 '21 at 21:10