Incorrect plot of sklearn's confusion matrix using seaborn

Question

The testset only contains classes 1 and 3 as shown by the print.

While plotting the heatmap of the confusion matrix using seaborn.

However, the seaborn heatmap plot class 0 and 2.

The plot should be shifted one row down. I assume the problem was caused by indexing.

from sklearn.metrics import confusion_matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import seaborn as sns

cf_matrix = confusion_matrix(y_true, y_pred)
print(Counter(y_pred))
print(Counter(y_true))

cmn = cf_matrix.astype('float') / cf_matrix.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (15,15))
sns.heatmap(cmn, annot=True, fmt='.1f')

Counter({3: 100489, 12: 11306, 11: 4314, 4: 3303, 8: 2510, 7: 1850, 5: 185, 10: 132, 2: 69})
Counter({3.0: 117955, 1.0: 6203})

I don't understand what exactly you're after. Could you add test data, such as in my example, and clearly indicate what you need to be different? — JohanC, Jan 27 '23 at 21:48

JohanC · Answer 1 · 2023-01-16T12:05:55.277

As cmn is a numpy array, seaborn doesn't know about the names of the rows and columns. The defaults are 0,1,2,.... It also helps to make sure both arrays y_pred and y_true are of the same integer type. E.g. y_true = y_true.astype(int).

Scikit-learn provides the function unique_labels to fetch the labels it used.

You can temporarily suppress the warning for division by zero via with np.errstate(invalid='ignore'):.

For testing, you could create some simple arrays which are easy to count manually, and investigate how confusion_matrix(y_true, y_pred) works in that case.

from sklearn.metrics import confusion_matrix
from sklearn.utils.multiclass import unique_labels
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np

y_true = [3, 3, 3, 3, 1, 1]
y_pred = [7, 3, 3, 1, 2, 1]

# make sure both arrays are of the same type
y_true = np.array(y_true).astype(int)
y_pred = np.array(y_pred).astype(int)

cf_matrix = confusion_matrix(y_true, y_pred)
with np.errstate(invalid='ignore'):
    cmn = cf_matrix.astype('float') / cf_matrix.sum(axis=1, keepdims=True)

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 6))

sns.set()
sns.heatmap(cmn, annot=True, fmt='.1f', annot_kws={"fontsize": 14},
            linewidths=2, linecolor='black', clip_on=False, ax=ax1)
ax1.set_title('using default labels', fontsize=16)

labels = unique_labels(y_true, y_pred)
sns.heatmap(cmn, xticklabels=labels, yticklabels=labels,
            annot=True, fmt='.1f', annot_kws={"fontsize": 14},
            linewidths=2, linecolor='black', clip_on=False, ax=ax2)
ax2.set_title('using the same labels as sk-learn', fontsize=16)

for ax in (ax1, ax2):
    ax.tick_params(labelsize=20, rotation=0)
    ax.set_xlabel('Predicted value', fontsize=18)
    ax.set_ylabel('True value', fontsize=18)
plt.tight_layout()
plt.show()

Thank you for answering but I did not explain myself clearly so I have updated my question. Would you please take another look? — Leo, Jan 27 '23 at 20:47

Incorrect plot of sklearn's confusion matrix using seaborn

1 Answers1

Linked