I am trying to predict diabetes where 1= diabetic and 0=non-diabetic and I am using random forest and decision tree. My data is significantly imbalanced causing my clasifiers predict 0 on sensitivity and 99 on specificity. Itried several methods including resampling my data with SMOTE. Now I want to optimize the model for precision to increase the true positive rate but when I run the gridsearch it throws me the following error:
UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples.
I tried to predict anyways and the results are the same as when I didnt use the precision optimization.
My code looks like this:
cl = RandomForestClassifier()
params = {
'n_estimators': [100, 300, 500, 800, 1000],
'criterion': ['gini', 'entropy'],
'bootstrap': [True, False],
'max_features': ['auto', 'sqrt', 'log2'],
'max_depth' : [4,5,6,7,8],
}
scorers = {
'precision_score': make_scorer(precision_score),
'recall_score': make_scorer(recall_score),
'accuracy_score': make_scorer(accuracy_score)
}
clff = GridSearchCV(estimator=cl, scoring= scorers, param_grid=params, refit='precision_score', cv=5, verbose=0)
forestscore= clff.fit(X_train, y_train)
Could someone help me with understanding what to do and where is the problem?