y contains previously unseen labels: 'Male' in Label encoder

Question

I am trying to convert the categorical column of my dataset into numerical using LabelEncoder. dataset

Here is the conversion code:

for i in cat_columns:
    df[i]=encoder.fit_transform(df[i])

After conversion dataset looks like dataset after transformation

But the problem is whenever I try to transform my test dataset it gives an error that

y contains previously unseen labels: 'Male'

Code for transformation on test data :

for i in cat_columns:
    df1[i]=encoder.transform(df1[i])

test data

Now how can i solve this problem?

Andrea · Accepted Answer · 2021-02-25T09:38:05.710

I guess the problem is that you are using the same encoder to fit all the different columns. You should instead fit each column using a different encoder. For example, you can use a dictionary to store the different encoders:

from sklearn import preprocessing

encoders = {}
for i in cat_columns:
    encoders[i] = preprocessing.LabelEncoder()
    df[i] = encoders[i].fit_transform(df[i])
    
for i in cat_columns:
    df1[i] = encoders[i].transform(df1[i])

The error you encounter (previously unseen labels: 'Male') is caused by the fact you are trying to transform the gender column using the last encoder you create in the previous for loop, which in your case might be a smoking_status label encoder.

y contains previously unseen labels: 'Male' in Label encoder

1 Answers1