what is the best way to keep columns names after doing OneHotEncoder in python?

Question

What is the best way to keep column names after doing one hot encoder in python? All my features are categorical so I did like below: so, after import the dataset it looks like below

 PlaceID       Date  ...  BlockedRet  OverallSeverity
0    23620  1/10/2019  ...           1                1
1    13352  1/10/2019  ...           1                1
2    13674  1/10/2019  ...           1                1
3    13501  1/10/2019  ...           1                1
4    13675  1/10/2019  ...           1                1

[5 rows x 28 columns]

after choosing the features, I want to transform them using one hot encoder because most of them are categorical, my question after doing that using:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

hotencode = OneHotEncoder(categorical_features=[0])
features = hotencode.fit_transform(features).toarray()

enter image description here the result comes without original column names, how can I transform them with the same column name+0.,1,2,3.

Sorry but to me it's very unclear what you are asking: 1) What does your DataFrame `df` looks like after loading the data? Please add an example in the question; 2) How is `df.iloc[:,+2:-1]` supposed to perform one-hot-encoding? To me it looks like just selecting the 3rd column; 3) What do you want to obtain at the end? Please share an example of your desired output in the question. — UJIN, Nov 12 '19 at 10:22

Suhas_Pote · Answer 1 · 2019-11-12T12:59:06.033

Here is a simple example:

import pandas as pd

df = pd.DataFrame([
       ['green', 'Chevrolet', 2017],
       ['blue', 'BMW', 2015], 
       ['yellow', 'Lexus', 2018],
])
df.columns = ['color', 'make', 'year']

df

'''
    color       make  year  color_encoded  Color_0  Color_1  Color_2
0   green  Chevrolet  2017              1      0.0      1.0      0.0
1    blue        BMW  2015              0      1.0      0.0      0.0
2  yellow      Lexus  2018              2      0.0      0.0      1.0
'''

Approach 1: One Hot Encoder

from sklearn.preprocessing import LabelEncoder
le_color = LabelEncoder()
df['color_encoded'] = le_color.fit_transform(df.color)

from sklearn.preprocessing import OneHotEncoder
color_ohe = OneHotEncoder()

X = color_ohe.fit_transform(df.color_encoded.values.reshape(-1,1)).toarray()

dfOneHot = pd.DataFrame(X, columns = ["Color_"+str(int(i)) for i in range(X.shape[1])])
df = pd.concat([df, dfOneHot], axis=1)

df

'''
    color       make  year  color_encoded  Color_0  Color_1  Color_2
0   green  Chevrolet  2017              1      0.0      1.0      0.0
1    blue        BMW  2015              0      1.0      0.0      0.0
2  yellow      Lexus  2018              2      0.0      0.0      1.0
'''

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Approach 2: Get Dummies

df_final = pd.concat([df, pd.get_dummies(df["color"],prefix="color")], axis=1)


df_final

'''
    color       make  year  color_blue  color_green  color_yellow
0   green  Chevrolet  2017           0            1             0
1    blue        BMW  2015           1            0             0
2  yellow      Lexus  2018           0            0             1
'''

Reference:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

Welcome :).....Please up-vote and accept the answer once you have sufficient reputation. — Suhas_Pote, Nov 13 '19 at 15:33

what is the best way to keep columns names after doing OneHotEncoder in python?

1 Answers1