-1

I am using the pycaret library and created a Catboost model from it

enter image description here

The model has a great AUC score, but pretty bad Recall and F1 which means that the normal threshold of 0.5 is not ideal, but that there is a threshold that will give good score for both of those metrics.

Is there any way to find this threshold? I am not so sure how to work this since I am trying out Pycaret

DanCor
  • 308
  • 2
  • 12

1 Answers1

0

Which threshold do you mean? For a feature selection? You can try several adjustments, in order to improve the model in comparison to your baseline in the picture above.

  1. compare_models() - maybe there are another algorithms, which perform better than catboost
  2. Feature Selection - RFE or Random Forest (here you can use the parameter feature_selection in PyCaret and try to play with threshold. The Boruta algorith should be checked as well).
  3. Feature Engineering
  4. fold=5
  5. Try several splits for train / test (80/20, 70/30 etc.)
  6. In PyCaret setup should be numerical and categorical features double-checked. When needed the format needs to be changed.

Try with compare

Essegn
  • 153
  • 1
  • 9
  • please correct me if I am wrong, but to get the recall of the model, one must set a threshold for the predicted probability of the instance right? so that we can label the clasess. Does pycaret automatically use 0.5 as that threshold? And how can we find the best threshold if we have a high AUC – DanCor Jul 23 '21 at 15:31
  • You are probably wrong. Recall = TruePositives / (TruePositives + FalseNegatives). Simply said, your model is bad and needs to be improved. I had similar results last week. Maybe a feature selection would be the first step to go. – Essegn Jul 24 '21 at 21:20