-1

I have Nine columns 'instlevel1','instlevel2','instlevel3', 'instlevel4', 'instlevel5','instlevel6','instlevel7','instlevel8','instlevel9'

the values on this column are populated as follow : if instlevel1 value is 1, all others values for are 0, if instlevel2 value is 1, all others values for all others columns (including instlevel1) is 0.

I want to "pivot" this on one column. I got the desired results. but I wonder if there is a most efficient way to do this. because this case is very repetitive. here is the code for what I had done.

nivelEducacion = test[['instlevel1','instlevel2','instlevel3', 'instlevel4', 'instlevel5','instlevel6','instlevel7','instlevel8','instlevel9']].idxmax(axis=1)

test['nivelEducacion'] = nivelEducacion
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel1'], '1')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel2'], '2')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel3'], '3')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel4'], '4')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel5'], '5')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel6'], '6')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel7'], '7')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel8'], '8')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel9'], '9')
test['nivelEducacion'] = test.nivelEducacion.astype('category')
test = test.drop(['instlevel1', 'instlevel2','instlevel3','instlevel4','instlevel5','instlevel6','instlevel7','instlevel8','instlevel9'], axis=1)
Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132

1 Answers1

0

You can use melt's function from pandas. It's not maybe the best solution but it do the job:

s = pd.Series(list('aaabbbccddefgh')).astype('category') # generate fake dataset
df = pd.get_dummies(s) # fake df like you have (One Hot Encoded)

df2 = pd.melt(df, value_vars=["a", "b", "c", "d", "e", "f", "g", "h"])
df2 = df2[df2.value == 1]  # to keep only existing categories
df2.drop("value", axis=1, inplace=True)

Another solution I found is this one

x = df.stack()  # in that case you have to restrict only to your columns
df2 = pd.Series(pd.Categorical(x[x!=0].index.get_level_values(1))).to_frame()

I hope it helps,

Nicolas

Nicolas M.
  • 1,472
  • 1
  • 13
  • 26