TL;DR When I calculate precision, recall, and f1 through CV cross_val_score()
, it gives me different results than when I calculate through the confusion matrix. Why does it give different precision, recall, and f1 scores?
I'm learning SVM in machine learning and I wanted to compare the result returned by cross_val_score
and the result I get from manually calculating the metrics from the confusion matrix. However, I have different result.
To start, I have written the code below using cross_val_score
.
clf = svm.SVC()
kfold = KFold(n_splits = 10)
accuracy = metrics.make_scorer(metrics.accuracy_score)
precision = metrics.make_scorer(metrics.precision_score, average = 'macro')
recall = metrics.make_scorer(metrics.recall_score, average = 'macro')
f1 = metrics.make_scorer(metrics.f1_score, average = 'macro')
accuracy_score = cross_val_score(clf, X, y, scoring = accuracy, cv = kfold)
precision_score = cross_val_score(clf, X, y, scoring = precision, cv = kfold)
recall_score = cross_val_score(clf, X, y, scoring = recall, cv = kfold)
f1_score = cross_val_score(clf, X, y, scoring = f1, cv = kfold)
print("accuracy score:", accuracy_score.mean())
print("precision score:", precision_score.mean())
print("recall score:",recall_score.mean())
print("f1 score:", f1_score.mean())
The result for each metric is shown below:
accuracy score: 0.97
precision score: 0.96
recall score: 0.97
f1 score: 0.96
In addition, I created a Confusion Matrix so that I can manually calculate the accuracy, precision, recall, and f1 score based on the values on the matrix. I manually created the Confusion Matrix because I am using K-Fold Cross Validation. To do that, I have to get the actual classes and predicted classes for each iteration of the Cross Validation and so I have this code:
def cross_val_predict(model, kfold : KFold, X : np.array, y : np.array) -> Tuple[np.array, np.array]:
model_ = cp.deepcopy(model)
# gets the number of classes in the column/attribute
no_of_classes = len(np.unique(y))
# initializing empty numpy arrays to be returned
actual_classes = np.empty([0], dtype = int)
predicted_classes = np.empty([0], dtype = int)
for train_index, test_index in kfold.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# append the actual classes for this iteration
actual_classes = np.append(actual_classes, y_test)
# fit the model
model_.fit(X_train, y_train)
# predict
predicted_classes = np.append(predicted_classes, model_.predict(X_test))
return actual_classes, predicted_classes
Afterwards, I created my confusion matrix after calling the above function.
actual_classes, predicted_classes = cross_val_predict(clf, kfold, X, y)
cm = metrics.confusion_matrix(y_true = actual_classes, y_pred = predicted_classes)
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = cm, display_labels = [2,4])
cm_display.plot()
Now, my confusion matrix looks like the below:
where: col is the predicted label, and row is the true label.
|------|------|
2 | 431 | 13 |
|------|------|
4 | 9 | 230 |
|------|------|
2 4
If I manually calcuate the accuracy, precision, recall, and f1 score from that matrix, I have the ff:
confusion matrix accuracy: 0.97
confusion matrix precision: 0.95
confusion matrix recall: 0.96
confusion matrix f1 score: 0.95
My question is that why did I get different result from manually calculating the metrics from the confusion matrix and the result from calling cross_val_score
while specifying which scorer to use, i.e., [accuracy, precision, recall, fscore].
I hope you guys can help me understand why. Thank you very much for your responses!