0

I use Python pycaret module to analyze big set of data. I did setup, compare_model, create_model correctly, but when I try to use model I created to predict the unseen_date I splite from beginning, there is only one row come, there is supposee 100k row need predict. I do skip the tune part cause it is take too long but I dont think thats the reason

TSLASAMPLE = TSLA.sample(frac=0.8)
data_unseen  = TSLA.drop(TSLASAMPLE.index)
TSLASAMPLE.reset_index(drop=True, inplace=True)
data_unseen .reset_index(drop=True, inplace=True)
TSLAinput = setup(data = TSLASAMPLE, target= 'prtPrice', use_gpu=True,html=False,silent=True)
dt = create_model('dt')
prediction = predict_model(dt,data=data_unseen)

output:

Model   MAE MSE RMSE    R2  RMSLE   MAPE
0   Decision Tree Regressor 0.1842  1.8393  1.3562  0.9996  0.0303  0.0082
James Z
  • 12,209
  • 10
  • 24
  • 44

2 Answers2

1

This is expected. The results (1 row) that you see are the metrics on the unseen data. The actual predictions are in your prediction variable.

Nikhil Gupta
  • 1,436
  • 12
  • 15
0

This is because "create_model" returns the list of trained models, where the 1st element is the best model based on the accuracy results.

If you want to make a prediction on unseen data for each model you should loop through each instance of compare_modesl(which is the list of the models):

You can try something like this:

model_list = compare_models()

predictions = []
for model in model_list:
    model_prediction = predict_model(model, data=data_unseen)
    predictions.append(model_prediction)

the 'predictions' list stored the results for each model. The order is the same as in the 'compare_models()'

Sheroz
  • 1
  • 1