I'm stuck in a problem with one hot encoding for categorical variables in Pycaret. The problem is that even setting my categorical variables, the pipeline apply normalization to categorical variables, and I have no idea what I'm doing wrong.
At first, using the code below everything works fine:
from pycaret.classification import *
from pycaret.datasets import get_data
import pandas as pd
import numpy as np
import seaborn as sns
dataset = get_data('income')
dataset.dtypes
Until I start the setup and
exp_clf01 = setup( data = dataset
, target = 'income >50K'
, session_id = 123
, numeric_features = ['age','education-num','capital-gain','capital-loss','hours-per-week']
, categorical_features = ['workclass','education','marital-status','occupation','relationship','race','sex','native-country']
)
df_transformed = get_config("X_train")
df_transformed.head()
After try look the head of data frame it only applies the one hot enconding to the column race
and normalizes the others categorical inputs and I can't figure out why.
age | workclass | education | education-num | marital-status | occupation | other colums |
---|---|---|---|---|---|---|
46.0 | 0.303273 | 0.271186 | 11.0 | 0.101942 | 0.484643 | ... |
27.0 | 0.218620 | 0.412939 | 13.0 | 0.044165 | 0.484643 | ... |
33.0 | 0.218557 | 0.568315 | 14.0 | 0.448894 | 0.455449 | ... |
60.0 | 0.218557 | 0.412673 | 13.0 | 0.448894 | 0.484286 | ... |
25.0 | 0.218620 | 0.063798 | 6.0 | 0.044165 | 0.229692 | ... |
How can I prevent this behavior?