I have a numpy 2-D array with categorical data at every column.
I try to separately encode the data at each column while possibly dealing with unseen data at each case.
I have this code:
from sklearn.preprocessing import LabelEncoder
for column in range(X_train.shape[1]):
label_encoder = LabelEncoder()
X_train[:, column] = label_encoder.fit_transform(X_train[:, column])
mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
map_function = lambda x: mappings.get(x, -1)
X_test[:, column] = map_function(X_test[:, column])
and I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-***********> in <module>
39 mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
40 map_function = lambda x: mappings.get(x, -1)
---> 41 X_test[:, column] = map_function(X_test[:, column])
42
43
<ipython-input-***********> in <lambda>(x)
38 X_train[:, column] = label_encoder.fit_transform(X_train[:, column])
39 mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
---> 40 map_function = lambda x: mappings.get(x, -1)
41 X_test[:, column] = map_function(X_test[:, column])
42
TypeError: unhashable type: 'numpy.ndarray'
How can I fix this?
In general, would you suggest a better way to do what I want to do?
P.S.
I tried to do this to see what is happening:
for column in range(X_train.shape[1]):
label_encoder = LabelEncoder()
X_train[:, column] = label_encoder.fit_transform(X_train[:, column])
mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
try:
map_function = lambda x: mappings.get(x, -1)
X_test[:, column] = map_function(X_test[:, column])
except:
print(X_test[:, column])
for i in range(X_test[:, column].shape[0]):
if isinstance(X_test[i, column],np.ndarray):
print(X_test[i, column])
print()
but actually nothing was printed by print(X_test[i, column])
so I am not sure if there is any numpy array within X_test[:, column]
.
I have actually also checked that if not isinstance(X_test[i, column],str)
and again nothing was printed so everything in X_train[:, column]
at each column
must be a string.
P.S.2
When I do this:
for i in range(X_test[:, column].shape[0]):
X_test[i, column] = mappings.get(X_test[i, column], -1)
it actually works with no error so it means that for some reason in the way I have defined the lambda
function I sent the whole numpy array to it than its element separately.