0

I am trying to convert the categorical column of my dataset into numerical using LabelEncoder. dataset

Here is the conversion code:

for i in cat_columns:
    df[i]=encoder.fit_transform(df[i])

After conversion dataset looks like dataset after transformation

But the problem is whenever I try to transform my test dataset it gives an error that

y contains previously unseen labels: 'Male'

Code for transformation on test data :

for i in cat_columns:
    df1[i]=encoder.transform(df1[i])

test data

Now how can i solve this problem?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
imtinan
  • 15
  • 5

1 Answers1

0

I guess the problem is that you are using the same encoder to fit all the different columns. You should instead fit each column using a different encoder. For example, you can use a dictionary to store the different encoders:

from sklearn import preprocessing

encoders = {}
for i in cat_columns:
    encoders[i] = preprocessing.LabelEncoder()
    df[i] = encoders[i].fit_transform(df[i])
    
for i in cat_columns:
    df1[i] = encoders[i].transform(df1[i])

The error you encounter (previously unseen labels: 'Male') is caused by the fact you are trying to transform the gender column using the last encoder you create in the previous for loop, which in your case might be a smoking_status label encoder.

Andrea
  • 2,932
  • 11
  • 23