Pycaret - Stuck on Setup()

Question

I'm using Pycaret classification to do some machine learning with my >1 million of data (this includes 18 categorical and 1 numerical features). Pandas Dataframe is storing the data pulled from Oracle database. These steps take about 2-3 minutes. When my data is being preprocessed, it's taking >7 hours. Is there a way to improve the speed?

Here's python SQL code:

from pycaret.classification import *
# init setup
clfl = setup(data=SQL_Query, target = 'cat_ind',silent = True, html = False,categorical_features= [cat1,cat2,cat3,cat4,cat5,cat6,cat7,cat8,cat9,cat10,cat11,cat12,cat13,cat14,cat15,cat16,cat17],numeric_features=['amt'],ignore_features=['paid','catignore']remove_outliers=True,train_size=0.9,handle_unknown_categorical=True, unknown_categorical_method='most_frequent'))

I would try to dump data to flat file (CSV). Then I will try MLJAR AutoML https://github.com/mljar/mljar-supervised It can handle missing values and categorical columns. Additionally, it will produce reports for trained ML models. — pplonski, Apr 08 '21 at 12:08
Not all algorithms support GPU. Most algorithms don't. You need to read the document first. — Frank, Aug 02 '21 at 13:14

score 0 · Answer 1 · answered May 25 '22 at 00:23

0

In pycaret, you can use use_gpu=True and Turbo=True

answered May 25 '22 at 00:23

Gopakumar G

106
3

score 0 · Answer 2 · answered Nov 12 '22 at 22:39

What shape is the data after setup()?

With that many categorical features there's a chance your features multiplied by orders of magnitude due to default one hot enconding pycaret setup() uses.

If that is the case, you should use high_cardinality_features to specify the features that have high number of unique values and then specify high_cardinality_method as either 'frequency' or 'clustering'.

Check the documentation here for more info.

Pycaret - Stuck on Setup()

2 Answers2