I'm using Pycaret classification to do some machine learning with my >1 million of data (this includes 18 categorical and 1 numerical features). Pandas Dataframe is storing the data pulled from Oracle database. These steps take about 2-3 minutes. When my data is being preprocessed, it's taking >7 hours. Is there a way to improve the speed?
Here's python SQL code:
from pycaret.classification import *
# init setup
clfl = setup(data=SQL_Query, target = 'cat_ind',silent = True, html = False,categorical_features= [cat1,cat2,cat3,cat4,cat5,cat6,cat7,cat8,cat9,cat10,cat11,cat12,cat13,cat14,cat15,cat16,cat17],numeric_features=['amt'],ignore_features=['paid','catignore']remove_outliers=True,train_size=0.9,handle_unknown_categorical=True, unknown_categorical_method='most_frequent'))