1

how to write a For Loop to show each KFold's confusion matrix, so that I can analyze why some of the recall scores are 0.

the code below for KFold

from sklearn import model_selection
from sklearn.model_selection import cross_val_score
kfold = model_selection.KFold(n_splits=6, random_state=19)
modelCV = LogisticRegression()
recall = model_selection.cross_val_score(modelCV, X_under, y_under, cv=kfold, scoring='recall')
print(recall)

the code below for plotting a confusion_matrix

cnf_matrix = confusion_matrix(y_test,y_pred)
np.set_printoptions(precision=2)

print("Recall metric in the testing dataset: ", cnf_matrix[1,1]/(cnf_matrix[1,0]+cnf_matrix[1,1]))

# Plot non-normalized confusion matrix
class_names = [0,1]
plt.figure()
plot_confusion_matrix(cnf_matrix
                      , classes=class_names
                      , title='Confusion matrix')
plt.show()
BigData
  • 397
  • 2
  • 3
  • 13
  • 3
    Is your data sorted in any way? What happens if you add shuffle=True to KFold() ? That might explain why some recall scores in cross_val are zero. – Jarad Apr 19 '18 at 04:35
  • 1
    Thanks, Jarad, shuffle=True make it work, but I wonder what does the shuffle mean? the sklearn documentation explained "shuffle: boolean, optional Whether to shuffle the data before splitting into batches. – BigData Apr 19 '18 at 04:52
  • 1
    Imagine you have 600 rows (samples) and pretend the last 100 samples are all zeros. Above you're doing 6-fold cross-validation so 500 rows are trained, 100 are left out for the test and it repeats it six times taking different "folds". Well, if 500 rows with data is trained and tries to predict on the 100 test samples with all zeros as values, it may give an unexpected score. This could happen if data is sorted in some way and not shuffled before-hand. This doesn't answer how to write a `for` loop to show each KFold though. – Jarad Apr 19 '18 at 05:14
  • 2
    Instead of KFold, I would recommend StratifiedKFold, which will preserve the ratio of classes in each fold. – Vivek Kumar Apr 19 '18 at 05:36
  • 1
    @Jarad. I dread to think how many people have made this mistake in the past without realising it. I know I made it when I first started using `cross_val_score`. I have made numerous comments/answers on here saying the same thing you have. – Stev Apr 19 '18 at 08:15

0 Answers0