How use Catboost to encode a dataset?

Question

There is a package based on the Catboost algorithm, [https://contrib.scikit-learn.org/category_encoders/_modules/category_encoders/cat_boost.html#CatBoostEncoder] that claims to use catboost algorithm to encode datasets. But it has not had all the features in the catboost package that allow refining the model training phase. After training, I'm trying to find a way to use the catboost to transform the categorical variables from a dataset. Could you help me with that?

score 0 · Answer 1 · answered Nov 05 '22 at 09:55

# If you want to test this on your local notebook
# http://contrib.scikit-learn.org/categorical-encoding/
!pip install category-encoders

# import libraries
import pandas as pd
  
# Make dataset based on Josh Starmer video example
# https://www.youtube.com/watch?v=EzjtTh-WUWY

categorical = ["Blue","Red","Green","Blue","Green","Green","Blue"]
numerical   = [1.72,  1.32,    1.81,  1.56,   1.64,   1.61,  1.73]
Label       = [1   ,    0 ,      1 ,    0 ,     1 ,     0 ,     0]
df = pd.DataFrame({
                   'favorite_color':categorical,
                   'Hight(m)':numerical,
                   'LovesTroll2':Label,
                   })
 
feature_list = list(df.columns) #['favorite_color', 'Hight(m)', 'LovesTroll2']

%%time
# import libraries 
from category_encoders.cat_boost import CatBoostEncoder
import category_encoders as ce

# Define catboost encoder
cbe_encoder = ce.cat_boost.CatBoostEncoder() #approach1
CBE_encoder = CatBoostEncoder()              #approach2

# Fit encoder and transform the features
train_cbe = cbe_encoder.fit_transform(df[feature_list], df[feature_list[-1]]) #approach1
Train_cbe = CBE_encoder.fit_transform(df[feature_list], df[feature_list[-1]]) #approach2
#print(Train_cbe) 
#   favorite_color  Hight(m)  LovesTroll2
#0        0.428571      1.72            1
#1        0.428571      1.32            0
#2        0.428571      1.81            1
#3        0.714286      1.56            0
#4        0.714286      1.64            1
#5        0.809524      1.61            0
#6        0.476190      1.73            0

# plot the encoded results over target/label
#train_cbe.plot(style=['o','rx'])

import matplotlib.pyplot as plt
plt.scatter(Train_cbe['LovesTroll2'], Train_cbe['favorite_color'])
plt.show()

How use Catboost to encode a dataset?

1 Answers1