1

I am using

pd.get_dummies

to transform categorical vector with 4 labels (strings) to 2d array with 4 columns. However, after I coudln't find a way to go back to the original values afterwards. I also couldn't do this when using

sklearn.preprocessing.OneHotEncoder

What is the best wat to one-hot-encode categorcal vector but have the ability to inverse the original value afterwards?

Cranjis
  • 1,590
  • 8
  • 31
  • 64

2 Answers2

0

You can make use of the inverse_transform method of sklearn.preprocessing.OneHotEncoder to do it. I have illustrated it with an example below:

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [['Male'], ['Female'], ['Female']]
enc.fit(X)
enc.categories_

[array(['Female', 'Male'], dtype=object)]

enc.transform([['Female'], ['Male']]).toarray()

array([[1., 0.],
       [0., 1.]])

enc.inverse_transform([[0, 1], [1,0], [0, 1]])

array([['Male'],
       ['Female'],
       ['Male']], dtype=object)

To get the category-to-key dictionary you could do this:

A = {}
for i in enc.categories_[0]:
    A[i] = enc.transform([[i]]).toarray()

But there could be a better way for doing this.

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
Parthasarathy Subburaj
  • 4,106
  • 2
  • 10
  • 24
0

You can find the max of row and replace it with that column name. import numpy as np import pandas as pd

df = pd.DataFrame({"A":[0,1,0,0],"B":[1,0,0,0],"C":[0,0,1,0], "D":[0,0,0,1]})

def decode(row):
    for c in df.columns:
        if row[c]==1:
            return c


df = df.apply(decode,axis=1)
print(df)

Output:

0    B
1    A
2    C
3    D
dtype: object
Strange
  • 1,460
  • 1
  • 7
  • 18