scikit-learn labelencoder unseen values

Question

uv = np.unique(X[:, 2])
uv2 = np.unique(X_test[:, 2])

print(uv)
#['Female' 'Male']

print(uv2)
#['Female' 'Male']

# Encoding categorical columns in the train dataset
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:, 2] = labelencoder_X.fit_transform(X[:, 2])  # Encoding column 2

# Encoding categorical columns in the test dataset
X_test[:, 2] = labelencoder_X.transform(X_test[:, 2])  # Encoding column 2

Result of last command:

ValueError: y contains previously unseen labels: 'Male'

I tried to mask the the unseen values and the result of X_test afetr encoding is empty.

score 0 · Answer 1 · answered Jun 22 '23 at 19:55

0

Fit the LabelEncoder on your whole dataset.

Alternatively, drop all rows without unknown columns as your model will not know what to do with the extra values if it is not in your train dataset.

answered Jun 22 '23 at 19:55

Chris

154
8

scikit-learn labelencoder unseen values

1 Answers1