2

I am trying to predict diabetes where 1= diabetic and 0=non-diabetic and I am using random forest and decision tree. My data is significantly imbalanced causing my clasifiers predict 0 on sensitivity and 99 on specificity. Itried several methods including resampling my data with SMOTE. Now I want to optimize the model for precision to increase the true positive rate but when I run the gridsearch it throws me the following error:

UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples.

I tried to predict anyways and the results are the same as when I didnt use the precision optimization.

My code looks like this:

cl = RandomForestClassifier() 
params = {  
    'n_estimators': [100, 300, 500, 800, 1000],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6,7,8],
}

scorers = {
    'precision_score': make_scorer(precision_score),
    'recall_score': make_scorer(recall_score),
    'accuracy_score': make_scorer(accuracy_score)
}

clff = GridSearchCV(estimator=cl, scoring= scorers, param_grid=params, refit='precision_score', cv=5, verbose=0)

forestscore= clff.fit(X_train, y_train) 

Could someone help me with understanding what to do and where is the problem?

Luisa Ka
  • 45
  • 4

1 Answers1

0

The problem could be that, as your estimator always returns the same value, some of the labels of y_trainare never predicted. Thus, the accuracy can not be predicted. You can find a similar problem on this thread:

UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples

You get the error message if you try those lines:

from sklearn.metrics import precision_score
y_true = [0, 1, 1, 0, 1, 1]
y_pred = [0, 0, 0, 0, 0, 0]
precision_score(y_true, y_pred)

UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples.
  'precision', 'predicted', average, warn_for)
0.0

The precision score seems to need binary values to computes correctly the score. As one of your vector (the predicted one in your case) is only made of 0, it cannot compute a score, and then it is set to 0.0

The solution to get rid of this warning will be to succeed to computes 1s with you estimator, so the predicted y given to the score function is not zeros.

Antonin G.
  • 374
  • 2
  • 8
  • I split datasets into train and test in the correct way, so I am really not sure what should I do. I oviously dont want to silence the warning as that wouldnt help in my predictions. This warning is running when I run the grid search @AntoningG. – Luisa Ka May 13 '19 at 13:40
  • I completed my previous answer to give you an example @LuisaKa – Antonin G. May 14 '19 at 09:28
  • Thank you very much! @AntoninG. – Luisa Ka May 15 '19 at 14:23