0

I'm using the Pycaret library in Colab to make a simple prediction on this dataset:

https://www.kaggle.com/andrewmvd/fetal-health-classification

When i run my code:

from pycaret.utils import enable_colab 
enable_colab()


from google.colab import drive
drive.mount('/content/drive')


import pandas as pd
from pycaret.classification import *
from pandas_profiling import ProfileReport


df= pd.read_csv("/content/drive/MyDrive/Pycaret/fetal_health.csv")


df2 = df.iloc[:,:11]
df2['fetal_health'] = df['fetal_health']



test = df2.sample(frac=0.10, random_state=42, weights='fetal_health')
train = df2.drop(test.index)

test.reset_index(inplace=True, drop=True)
train.reset_index(inplace=True, drop=True)


clf = setup(data =train, target = 'fetal_health', session_id=42,
 log_experiment=True, experiment_name='fetal', normalize=True)

best = compare_models(sort="Accuracy")


rf = create_model('rf', fold=30)


tuned_rf = tune_model(rf, optimize='Accuracy')


predict_model(tuned_rf)

I get this error:

error

I think this is because my target variable is imbalanced (see img) and is causing the predictions to be incorrect.

enter image description here

Can someone pls help me understand ? Tks in advance

3 Answers3

0

Have you run each step in a separate cell to check the outputs?

Run

clf = setup(data =train, target = 'fetal_health', session_id=42,
 log_experiment=True, experiment_name='fetal', normalize=True)

and check:

  1. Are all variable types correctly inferred? (E.g., using your code with the Kaggle dataset of the same name, all variable shows as numeric except for severe_decelerations that shows as "Categorical" -- is it correct?

  2. Is there any preprocessing configuration that needs to change? I'm sure your issue has nothing to do with an imbalanced target variable, but you can test yourself by changing your setup (adding fix_imbalance = True to change the default -- it shows as False when you check the setup output).

You can learn more about the available preprocessing configurations here:

https://pycaret.gitbook.io/docs/get-started/preprocessing

Also, while troubleshooting, you can save yourself some work by using

best_model = create_model(best, fold=30)
predict_model(best_model)

(No need to look up the best model to add manually to create_model(), or to use tune_model() until you got the model working.)

A. Beal
  • 93
  • 1
  • 6
0

I found what the problem was: My target variables begin with value 1 and has 3 different values. This makes a error when the Pycaret tries to make a list comprehension (because it starts with the zero index). To solve that i just transformed my variable to begin with zero and worked fine

0

Leandro,

thank you so much for your solution! I was having the same problem with the same dataset!

A. Beal, I tried your solution, but still the same error message appeared, so I tried Leandro's solution, and the problem was, in fact, the target beginning with 1, and not 0. Thank you for your suggestion on how to reduce the code!