Pycaret setup for one hot encoding

Question

I'm stuck in a problem with one hot encoding for categorical variables in Pycaret. The problem is that even setting my categorical variables, the pipeline apply normalization to categorical variables, and I have no idea what I'm doing wrong.

At first, using the code below everything works fine:

from pycaret.classification import *
from pycaret.datasets import get_data
import pandas as pd
import numpy as np
import seaborn as sns
dataset = get_data('income')
dataset.dtypes

Until I start the setup and

exp_clf01 = setup(  data = dataset
                  , target = 'income >50K'
                  , session_id = 123
                  , numeric_features = ['age','education-num','capital-gain','capital-loss','hours-per-week']
                  , categorical_features = ['workclass','education','marital-status','occupation','relationship','race','sex','native-country']
                 )
df_transformed = get_config("X_train")
df_transformed.head()

After try look the head of data frame it only applies the one hot enconding to the column race and normalizes the others categorical inputs and I can't figure out why.

age	workclass	education	education-num	marital-status	occupation	other colums
46.0	0.303273	0.271186	11.0	0.101942	0.484643	...
27.0	0.218620	0.412939	13.0	0.044165	0.484643	...
33.0	0.218557	0.568315	14.0	0.448894	0.455449	...
60.0	0.218557	0.412673	13.0	0.448894	0.484286	...
25.0	0.218620	0.063798	6.0	0.044165	0.229692	...

How can I prevent this behavior?

Pycaret setup for one hot encoding

0 Answers0