OneHotEncoding method is failing in sklearn

Question

I have a data frame which i will denote df for now and i obtain an ndarray as follows

X=df.iloc[:,5:].values

which i want to use for a machine learning model. I need to one-hot-encode the 12th column of X.

Using sklearn i first labelencoded it as follows

from sklearn.preprocessing import LabelEncoder,OneHotEncoder
labelencoder_x=LabelEncoder()
df[:,12]=labelencoder_x.fit_transform(df[:,12])

and this works fine.

Next i try one-hot-encoding as follows

onehotencoder=OneHotEncoder(categorical_features=[12])
X=onehotencoder.fit_transform(X).toarray()

and i get the following error

ValueError: Input contains NaN, infinity or a value too large for 
dtype('float64').

Could someone help me on this, i'm new to programming in python and am eager to learn what is wrong with what i did and how i can fix it. I tried doing some debugging by seeing if np.nan is in the 12th column and i get False, i also checked the type of each element in the 12th column and it is int.

Even though you have specified the `categorical_features` param, the one-hot encoder will still check the whole data (not just that column) for correct data. And hence the error. Just send the single column and then append the transformed data back. Or else fix the data. — Vivek Kumar, May 02 '18 at 14:49

score 1 · Answer 1 · answered May 02 '18 at 14:54

1

If you just have one column as categorical and want it to be one hot encoded. It is worth trying get_dummies() which should give the result you are expecting. Pandas Docs

answered May 02 '18 at 14:54

mad_

8,121
2
25
40

But make sure not to use this if data is split into train and test. – Vivek Kumar May 02 '18 at 14:56

OneHotEncoding method is failing in sklearn

1 Answers1