1

I'm using titanic dataset so it's pretty balanced (about 60:40) and the GaussianNB model (standard parameters) has accuracy of 0.659. When I plotted F1, precision and recall I discovered the reason for such a low score.

F1, precision and recall of GaussianNB

Confusion matrix

Then I calculated ROC-AUC and it's 0.84. I've spent hours trying to understand what happened here, but every answer and blog post is mainly about how misleading ROC-AUC is when used for imbalanced data. What makes ROC-AUC so high, even if we clearly see the model isn't doing very well? Is it because of high recall?

The ROC-AUC Score of LogisticRegression: 0.861
The ROC-AUC Score of LinearDiscriminant: 0.859
The ROC-AUC Score of KNeighbors: 0.855
The ROC-AUC Score of SVC: 0.836
The ROC-AUC Score of GaussianProcess: 0.860
The ROC-AUC Score of DecisionTree: 0.785
The ROC-AUC Score of GaussianNB: 0.840
ROC Curve for other models

janpowaga
  • 11
  • 4

1 Answers1

0

The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at different thresholds. Area under this curve is AUC ROC metric. Range for ROC AUC is [0.5, 1]. Accuracy metric needs us to choose a threshold for calculating 1 or 0 predictions. Range for Accuracy metric is [0, 1]. Above situation is possible given the difference in methods for calculating these metrics.

nithish08
  • 468
  • 2
  • 7