5

I'm using LabelEncoder and OneHotEncoder to handle 'categorical data' in my dataset. In my data set there is a column which can have two values either 'Petrol' or 'Diesel' and I want to encode that column. I'm running this piece of code and its giving an error.

import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder,OneHotEncoder

dataset = pd.read_csv('ToyotaCorolla.csv')
X = dataset.iloc[:, 1:10].values
y = dataset.iloc[:, 0].values

labelencoder_X = LabelEncoder()
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()

Column[3] is the one which will have categorical value. But it is showing up an error "ValueError: could not convert string to float: 'Diesel'". I dont know where I'm going wrong. please help. Thanks!

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Kamal Aujla
  • 327
  • 2
  • 10
  • Question has nothing to do with `spyder` - kindly do not spam irrelevant tags (removed & replaced with `scikit-learn`). – desertnaut Apr 09 '19 at 21:17

2 Answers2

5

categorical_features is deprecated, instead directly transform your categorical feature

onehotencoder = OneHotEncoder(categories='auto')
feature = onehotencoder.fit_transform(X[:, 3].reshape(-1, 1))
Juan Carlos Ramirez
  • 2,054
  • 1
  • 7
  • 22
0

this error comes when your x is having a column with categories in string format when I had had this error I used label encoder to all the categorical columns in X as you did to column 3 and then apply one hot encoder to column 3

"so what you have to do is LabelEncode all the categorical columns in X and then apply one hot encoder to your desired column"

raj kumar
  • 41
  • 1