1

PyCaret seems like a great AutoML tool. It works, fast and simple and I would like to download the generated pipeline code into .py files to double check and if needed to customize some parts. Unfortunately, I don't know how to make it real. Reading the documentation have not helped. Is it possible or not?

Ivan Shelonik
  • 1,958
  • 5
  • 25
  • 49

3 Answers3

1

It is not possible to get the underlying code since PyCaret takes care of this for you. But it is up to you as the user to decide the steps that you want your flow to take e.g.

# Setup experiment with user-defined options for preprocessing, etc.
setup(...) 

# Create a model (uses training split only)
model = create_model("lr")

# Tune hyperparameters (user can pass a custom tuning grid if needed)
# Again, uses training split only
tuned = tune_model(model, ...)

# Finalize model (so that the best hyperparameters are retrained on the entire dataset
finalize_model(tuned)

# Any other steps you would like to do.
...

Finally, you can save the entire pipeline as a pkl file for use later

# Saves the model + pipeline as a pkl file
save_model(final, "my_best_model") 
Nikhil Gupta
  • 1,436
  • 12
  • 15
0

You may get a partial answer: incomplete with 'get_config("prep_pipe")' in 2.6.10 or in 3.0.0rc1
Just run a setup like in examples, store as a cdf1, and try cdf.pipeline and you may get a text like this: Pipeline(..)

L8R
  • 401
  • 5
  • 21
0

When working with pycaret=3.0.0rc4, you have two options.

Option 1:

get_config("pipeline")

Option 2:

lb = get_leaderboard()
lb.iloc[0]['Model']

Option 1 will give you the transformations done to the data whilst option 2 will give you the same plus the model and its parameters.

Here's some sample code (from a notebook, based on their documentation on the Binary Classification Tutorial (CLF101) - Level Beginner):

from pycaret.datasets import get_data
from pycaret.classification import *

dataset = get_data('credit')

data = dataset.sample(frac=0.95, random_state=786).reset_index(drop=True)
data_unseen = dataset.drop(data.index).reset_index(drop=True)

exp_clf101 = setup(data = data, target = 'default', session_id=123) 

best = compare_models()

evaluate_model(best)


# OPTION 1
get_config("pipeline")


# OPTION 2
lb = get_leaderboard()
lb.iloc[0]['Model']