Considering data like:
from sklearn.preprocessing import OneHotEncoder
import numpy as np
dt = 'object, i4, i4'
d = np.array([('aaa', 1, 1), ('bbb', 2, 2)], dtype=dt)
I want to exclude the text column using the OHE functionality.
Why does the following not work?
ohe = OneHotEncoder(categorical_features=np.array([False,True,True], dtype=bool))
ohe.fit(d)
ValueError: could not convert string to float: 'bbb'
It says in the documentation:
categorical_features: “all” or array of indices or mask :
Specify what features are treated as categorical.
‘all’ (default): All features are treated as categorical.
array of indices: Array of categorical feature indices.
mask: Array of length n_features and with dtype=bool.
I'm using a mask, yet it still tries to convert to float.
Even using
ohe = OneHotEncoder(categorical_features=np.array([False,True,True], dtype=bool),
dtype=dt)
ohe.fit(d)
Same error.
And also in the case of "array of indices":
ohe = OneHotEncoder(categorical_features=np.array([1, 2]), dtype=dt)
ohe.fit(d)