Let's assume that I have a pandas dataframe with the following column names:
'age'
(e.g. 33, 26, 51 etc)'seniority'
(e.g. 'junior', 'senior' etc)'gender'
(e.g. 'male', 'female')'salary'
(e.g. 32000, 40000, 64000 etc)
I want to transform the seniority
categorical variables to one hot encoded values. For this reason I am doing the following:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
data['seniority'] = label_encoder.fit_transform(data['seniority'])
from sklearn.preprocessing import OneHotEncoder
one_hot_encoder = OneHotEncoder(categorical_features=[1])
data = one_hot_encoder.fit_transform(data.values)
But then I am getting this error
ValueError: could not convert string to float: 'gender'
at line
data = one_hot_encoder.fit_transform(data.values)
However, I have explicitly specified that categorical_features=[1]
so only column 1 (seniority
) should be considered for this one hot encoding.
How can I fix this error (except for example by dropping the column 'gender')?
I was using pandas.get_dummies
in the past and I did not have this problem.