1

I am trying to build a ML model using pycaret. I used the below setup function

clf = setup(data = df.loc[:, df.columns != 'ID'], target='final_label',session_id=123, 
            categorical_features=['Gender','Country'], 
            fold_strategy='stratifiedkfold', 
            fold=5, fold_shuffle=True, n_jobs=-1, 
            create_clusters=False,polynomial_features=False, 
            polynomial_degree=2, trigonometry_features=False, polynomial_threshold=0.1, 
            remove_multicollinearity=True, multicollinearity_threshold=0.90)

This initializes the process with list of variables from which I wish to extract transformed_train_set and transformed_test_set

enter image description here

I would like to export the train and test data before and after transformation but pycaret doesn't have any way to export this data?

When I try the code below:

train_data = predict_model(rft,data = X_train,raw_score=True)
train_data['phase'] = 'train'
test_data = predict_model(rft,data = X_test,raw_score=True)
test_data['phase'] = 'test'

it throws error:

NameError: name 'X_train' is not defined
desertnaut
  • 57,590
  • 26
  • 140
  • 166
The Great
  • 7,215
  • 7
  • 40
  • 128

1 Answers1

2

You can export the train and test data before and after transformation using get_config(variable).

from pycaret.datasets import get_data
from pycaret.classification import *
data = get_data('diabetes', verbose=False)
s = setup(data, target = 'Class variable', session_id = 123, normalize=True, verbose=False)
rf= create_model('rf')

# check all available param
get_config()

X_train = get_config('X_train')
X_train_transformed = get_config('X_train_transformed')

X_test = get_config('X_test')
X_test_transformed = get_config('X_test_transformed')

train_data = predict_model(rf, data = X_train,raw_score=True)
train_data['phase'] = 'train'
train_transformed_data = predict_model(rf, data = X_train_transformed,raw_score=True)
train_transformed_data['phase'] = 'train_transformed'

test_data = predict_model(rf, data = X_test,raw_score=True)
test_data['phase'] = 'test'
test_transformed_data = predict_model(rf, data = X_test_transformed,raw_score=True)
test_transformed_data['phase'] = 'test_transformed'
Tatchai S.
  • 315
  • 1
  • 3
  • 9
  • Don't know for some reason this doesn't work in pycaret=2.3.10 – The Great May 17 '23 at 09:55
  • PyCaret 2.3.10 only has 1. X: Transformed dataset (X), 2. X_train: Transformed train dataset (X), and 3. X_test: Transformed test/holdout dataset (X). It doesn't has train and test data before the transformation. You can find all available param at https://github.com/pycaret/pycaret/blob/2.3.10/pycaret/classification.py#L2699 – Tatchai S. May 17 '23 at 12:49