0

I'm stuck in a problem with one hot encoding for categorical variables in Pycaret. The problem is that even setting my categorical variables, the pipeline apply normalization to categorical variables, and I have no idea what I'm doing wrong.

At first, using the code below everything works fine:

from pycaret.classification import *
from pycaret.datasets import get_data
import pandas as pd
import numpy as np
import seaborn as sns
dataset = get_data('income')
dataset.dtypes

Until I start the setup and

exp_clf01 = setup(  data = dataset
                  , target = 'income >50K'
                  , session_id = 123
                  , numeric_features = ['age','education-num','capital-gain','capital-loss','hours-per-week']
                  , categorical_features = ['workclass','education','marital-status','occupation','relationship','race','sex','native-country']
                 )
df_transformed = get_config("X_train")
df_transformed.head()

After try look the head of data frame it only applies the one hot enconding to the column race and normalizes the others categorical inputs and I can't figure out why.

age workclass education education-num marital-status occupation other colums
46.0 0.303273 0.271186 11.0 0.101942 0.484643 ...
27.0 0.218620 0.412939 13.0 0.044165 0.484643 ...
33.0 0.218557 0.568315 14.0 0.448894 0.455449 ...
60.0 0.218557 0.412673 13.0 0.448894 0.484286 ...
25.0 0.218620 0.063798 6.0 0.044165 0.229692 ...

How can I prevent this behavior?

Reda El Hail
  • 966
  • 1
  • 7
  • 17

0 Answers0