"ValueError: could not convert string to float" while using OneHotEncoder for machine learning

Question

I'm using LabelEncoder and OneHotEncoder to handle 'categorical data' in my dataset. In my data set there is a column which can have two values either 'Petrol' or 'Diesel' and I want to encode that column. I'm running this piece of code and its giving an error.

import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder,OneHotEncoder

dataset = pd.read_csv('ToyotaCorolla.csv')
X = dataset.iloc[:, 1:10].values
y = dataset.iloc[:, 0].values

labelencoder_X = LabelEncoder()
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()

Column[3] is the one which will have categorical value. But it is showing up an error "ValueError: could not convert string to float: 'Diesel'". I dont know where I'm going wrong. please help. Thanks!

Question has nothing to do with `spyder` - kindly do not spam irrelevant tags (removed & replaced with `scikit-learn`). — desertnaut, Apr 09 '19 at 21:17

score 5 · Accepted Answer · answered Apr 09 '19 at 22:27

5

categorical_features is deprecated, instead directly transform your categorical feature

onehotencoder = OneHotEncoder(categories='auto')
feature = onehotencoder.fit_transform(X[:, 3].reshape(-1, 1))

answered Apr 09 '19 at 22:27

Juan Carlos Ramirez

2,054
1
7
22

score 0 · Answer 2 · answered Apr 11 '20 at 07:29

this error comes when your x is having a column with categories in string format when I had had this error I used label encoder to all the categorical columns in X as you did to column 3 and then apply one hot encoder to column 3

"so what you have to do is LabelEncode all the categorical columns in X and then apply one hot encoder to your desired column"

"ValueError: could not convert string to float" while using OneHotEncoder for machine learning

2 Answers2