122

I want to plot a confusion matrix to visualize the classifer's performance, but it shows only the numbers of the labels, not the labels themselves:

from sklearn.metrics import confusion_matrix
import pylab as pl
y_test=['business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business']

pred=array(['health', 'business', 'business', 'business', 'business',
       'business', 'health', 'health', 'business', 'business', 'business',
       'business', 'business', 'business', 'business', 'business',
       'health', 'health', 'business', 'health'], 
      dtype='|S8')

cm = confusion_matrix(y_test, pred)
pl.matshow(cm)
pl.title('Confusion matrix of the classifier')
pl.colorbar()
pl.show()

How can I add the labels (health, business..etc) to the confusion matrix?

EdChum
  • 376,765
  • 198
  • 813
  • 562
hmghaly
  • 1,411
  • 3
  • 29
  • 47
  • 2
    Very simple solution and even without sklearn but prints the labels `pandas.crosstab(y_test, pred, rownames=['True'], colnames=['Predicted'], margins=True)` – nadya Jul 06 '22 at 21:27

10 Answers10

101

UPDATE:

Check the ConfusionMatrixDisplay


OLD ANSWER:

I think it's worth mentioning the use of seaborn.heatmap here.

import seaborn as sns
import matplotlib.pyplot as plt     

ax= plt.subplot()
sns.heatmap(cm, annot=True, fmt='g', ax=ax);  #annot=True to annotate cells, ftm='g' to disable scientific notation

# labels, title and ticks
ax.set_xlabel('Predicted labels');ax.set_ylabel('True labels'); 
ax.set_title('Confusion Matrix'); 
ax.xaxis.set_ticklabels(['business', 'health']); ax.yaxis.set_ticklabels(['health', 'business']);

enter image description here

akilat90
  • 5,436
  • 7
  • 28
  • 42
  • 36
    Suggestion: pass `fmt='g'` to the `heatmap` call to keep numbers from going to scientific notation. – polm23 Feb 20 '18 at 08:37
  • 5
    Suggestion: pass `cmap='Greens'` to the `heatmap` call to have intuitive color meaning. – EliadL Dec 27 '18 at 18:50
  • 1
    How to be sure you're not mixing up the labels? – Revolucion for Monica Feb 27 '20 at 12:15
  • @RevolucionforMonica When you get the `confusion_matrix`, the X axis tick labels are 1, 0 and Y axis tick labels are 0, 1 (in the axis values increasing order). If the classifier is `clf`, you can get the class order by `clf.classes_`, which should match `["health", "business"]` in this case. (It is assumed that `business` is the positive class). – akilat90 Feb 27 '20 at 12:59
83

As hinted in this question, you have to "open" the lower-level artist API, by storing the figure and axis objects passed by the matplotlib functions you call (the fig, ax and cax variables below). You can then replace the default x- and y-axis ticks using set_xticklabels/set_yticklabels:

from sklearn.metrics import confusion_matrix

labels = ['business', 'health']
cm = confusion_matrix(y_test, pred, labels)
print(cm)
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(cm)
plt.title('Confusion matrix of the classifier')
fig.colorbar(cax)
ax.set_xticklabels([''] + labels)
ax.set_yticklabels([''] + labels)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

Note that I passed the labels list to the confusion_matrix function to make sure it's properly sorted, matching the ticks.

This results in the following figure:

enter image description here

Conor Livingston
  • 905
  • 1
  • 8
  • 17
metakermit
  • 21,267
  • 15
  • 86
  • 95
  • 3
    If you have more than a few categories, Matplotlib decides to label the axes incorrectly - you have to force it to label every cell. `from matplotlib.ticker import MultipleLocator; ax.xaxis.set_major_locator(MultipleLocator(1)); ax.yaxis.set_major_locator(MultipleLocator(1))` – rescdsk May 29 '14 at 19:11
  • Being a new one, could you tell me if the size of 3 boxes are implying the level of accuracy? – Borys Jul 01 '15 at 13:38
  • how do I display the numbers on them? since colors may not convey much in all cases – kRazzy R Mar 01 '18 at 00:40
  • Hi...@metakermit . Could you tell how to show the numbers inside the coloured figure? – Humaun Rashid Nayan Apr 19 '18 at 08:50
45

I found a function that can plot the confusion matrix which generated from sklearn.

import numpy as np


def plot_confusion_matrix(cm,
                          target_names,
                          title='Confusion matrix',
                          cmap=None,
                          normalize=True):
    """
    given a sklearn confusion matrix (cm), make a nice plot

    Arguments
    ---------
    cm:           confusion matrix from sklearn.metrics.confusion_matrix

    target_names: given classification classes such as [0, 1, 2]
                  the class names, for example: ['high', 'medium', 'low']

    title:        the text to display at the top of the matrix

    cmap:         the gradient of the values displayed from matplotlib.pyplot.cm
                  see http://matplotlib.org/examples/color/colormaps_reference.html
                  plt.get_cmap('jet') or plt.cm.Blues

    normalize:    If False, plot the raw numbers
                  If True, plot the proportions

    Usage
    -----
    plot_confusion_matrix(cm           = cm,                  # confusion matrix created by
                                                              # sklearn.metrics.confusion_matrix
                          normalize    = True,                # show proportions
                          target_names = y_labels_vals,       # list of names of the classes
                          title        = best_estimator_name) # title of graph

    Citiation
    ---------
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

    """
    import matplotlib.pyplot as plt
    import numpy as np
    import itertools

    accuracy = np.trace(cm) / np.sum(cm).astype('float')
    misclass = 1 - accuracy

    if cmap is None:
        cmap = plt.get_cmap('Blues')

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()

    if target_names is not None:
        tick_marks = np.arange(len(target_names))
        plt.xticks(tick_marks, target_names, rotation=45)
        plt.yticks(tick_marks, target_names)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]


    thresh = cm.max() / 1.5 if normalize else cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if normalize:
            plt.text(j, i, "{:0.4f}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
        else:
            plt.text(j, i, "{:,}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")


    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
    plt.show()

It will look like this enter image description here

georg-un
  • 1,123
  • 13
  • 24
Calvin Duy Canh Tran
  • 1,078
  • 11
  • 16
  • How would this work with more than 3 classes? – Kurt May 27 '21 at 12:24
  • 1
    @Kurt, when you have more than 3 classes, you just get a larger confusion matrix (i.e. a 10 x 10 matrix if you have 10 classes). NB: The `target_names` variable through which you pass the labels of your classes to the function needs to match the number of classes in the confusion matrix. – Sven van der Burg Jun 24 '21 at 12:49
45

To add to @akilat90's update about sklearn.metrics.plot_confusion_matrix:

You can use the ConfusionMatrixDisplay class within sklearn.metrics directly and bypass the need to pass a classifier to plot_confusion_matrix. It also has the display_labels argument, which allows you to specify the labels displayed in the plot as desired.

The constructor for ConfusionMatrixDisplay doesn't provide a way to do much additional customization of the plot, but you can access the matplotlib axes obect via the ax_ attribute after calling its plot() method. I've added a second example showing this.

I found it annoying to have to rerun a classifier over a large amount of data just to produce the plot with plot_confusion_matrix. I am producing other plots off the predicted data, so I don't want to waste my time re-predicting every time. This was an easy solution to that problem as well.

Example:

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

cm = confusion_matrix(y_true, y_preds, normalize='all')
cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
cmd.plot()

confusion matrix example 1

Example using ax_:

cm = confusion_matrix(y_true, y_preds, normalize='all')
cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
cmd.plot()
cmd.ax_.set(xlabel='Predicted', ylabel='True')

confusion matrix example

  • 1
    This is excellent - thanks! Question: Can you customize 'True label' and 'Predicted label' values for the axis labels? – caydin Oct 13 '20 at 01:00
  • 2
    I didn't realize this before, but you can access the matplotlib axes object via `cmd.ax_`, which allows a lot of control of the plot. To customize the axis labels use something like this: `cmd.ax_.set(xlabel='foo', ylabel='bar')` . I will update my answer. – themaninthewoods Oct 13 '20 at 20:41
  • 1
    Thanks a lot! But it looks like the `cmd.ax_.set` disables the `display_labels=['business','health']` ? – caydin Oct 14 '20 at 02:50
  • 1
    Also I'm getting `AttributeError: 'ConfusionMatrixDisplay' object has no attribute 'ax_' `. – caydin Oct 14 '20 at 02:53
  • 2
    Ah, you are right! Thanks for pointing those things out. In my excitement to find the solution I made a few mistakes in my update. Please see the latest version, it should work now. – themaninthewoods Oct 14 '20 at 05:15
  • "I found it annoying to have to rerun a classifier over a large amount of data just to produce the plot with plot_confusion_matrix." - I find it peculiar that this reruns the classifier and does not use the values given to it. I was getting mixed results using this and the simpler confusion_matrix(). Glad you pointed it out! – Vaidøtas I. Sep 01 '21 at 13:36
31
from sklearn import model_selection
test_size = 0.33
seed = 7
X_train, X_test, y_train, y_test = model_selection.train_test_split(feature_vectors, y, test_size=test_size, random_state=seed)

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

model = LogisticRegression()
model.fit(X_train, y_train)
result = model.score(X_test, y_test)
print("Accuracy: %.3f%%" % (result*100.0))
y_pred = model.predict(X_test)
print("F1 Score: ", f1_score(y_test, y_pred, average="macro"))
print("Precision Score: ", precision_score(y_test, y_pred, average="macro"))
print("Recall Score: ", recall_score(y_test, y_pred, average="macro")) 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

def cm_analysis(y_true, y_pred, labels, ymap=None, figsize=(10,10)):
    """
    Generate matrix plot of confusion matrix with pretty annotations.
    The plot image is saved to disk.
    args: 
      y_true:    true label of the data, with shape (nsamples,)
      y_pred:    prediction of the data, with shape (nsamples,)
      filename:  filename of figure file to save
      labels:    string array, name the order of class labels in the confusion matrix.
                 use `clf.classes_` if using scikit-learn models.
                 with shape (nclass,).
      ymap:      dict: any -> string, length == nclass.
                 if not None, map the labels & ys to more understandable strings.
                 Caution: original y_true, y_pred and labels must align.
      figsize:   the size of the figure plotted.
    """
    if ymap is not None:
        # change category codes or labels to new labels 
        y_pred = [ymap[yi] for yi in y_pred]
        y_true = [ymap[yi] for yi in y_true]
        labels = [ymap[yi] for yi in labels]
    # calculate a confusion matrix with the new labels
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    # calculate row sums (for calculating % & plot annotations)
    cm_sum = np.sum(cm, axis=1, keepdims=True)
    # calculate proportions
    cm_perc = cm / cm_sum.astype(float) * 100
    # empty array for holding annotations for each cell in the heatmap
    annot = np.empty_like(cm).astype(str)
    # get the dimensions
    nrows, ncols = cm.shape
    # cycle over cells and create annotations for each cell
    for i in range(nrows):
        for j in range(ncols):
            # get the count for the cell
            c = cm[i, j]
            # get the percentage for the cell
            p = cm_perc[i, j]
            if i == j:
                s = cm_sum[i]
                # convert the proportion, count, and row sum to a string with pretty formatting
                annot[i, j] = '%.1f%%\n%d/%d' % (p, c, s)
            elif c == 0:
                annot[i, j] = ''
            else:
                annot[i, j] = '%.1f%%\n%d' % (p, c)
    # convert the array to a dataframe. To plot by proportion instead of number, use cm_perc in the DataFrame instead of cm
    cm = pd.DataFrame(cm, index=labels, columns=labels)
    cm.index.name = 'Actual'
    cm.columns.name = 'Predicted'
    # create empty figure with a specified size
    fig, ax = plt.subplots(figsize=figsize)
    # plot the data using the Pandas dataframe. To change the color map, add cmap=..., e.g. cmap = 'rocket_r'
    sns.heatmap(cm, annot=annot, fmt='', ax=ax)
    #plt.savefig(filename)
    plt.show()

cm_analysis(y_test, y_pred, model.classes_, ymap=None, figsize=(10,10))

enter image description here

using https://gist.github.com/hitvoice/36cf44689065ca9b927431546381a3f7

Note that if you use rocket_r it will reverse the colors and somehow it looks more natural and better such as below: enter image description here

filups21
  • 1,611
  • 1
  • 19
  • 22
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
23

You might be interested by https://github.com/pandas-ml/pandas-ml/

which implements a Python Pandas implementation of Confusion Matrix.

Some features:

  • plot confusion matrix
  • plot normalized confusion matrix
  • class statistics
  • overall statistics

Here is an example:

In [1]: from pandas_ml import ConfusionMatrix
In [2]: import matplotlib.pyplot as plt

In [3]: y_test = ['business', 'business', 'business', 'business', 'business',
        'business', 'business', 'business', 'business', 'business',
        'business', 'business', 'business', 'business', 'business',
        'business', 'business', 'business', 'business', 'business']

In [4]: y_pred = ['health', 'business', 'business', 'business', 'business',
       'business', 'health', 'health', 'business', 'business', 'business',
       'business', 'business', 'business', 'business', 'business',
       'health', 'health', 'business', 'health']

In [5]: cm = ConfusionMatrix(y_test, y_pred)

In [6]: cm
Out[6]:
Predicted  business  health  __all__
Actual
business         14       6       20
health            0       0        0
__all__          14       6       20

In [7]: cm.plot()
Out[7]: <matplotlib.axes._subplots.AxesSubplot at 0x1093cf9b0>

In [8]: plt.show()

Plot confusion matrix

In [9]: cm.print_stats()
Confusion Matrix:

Predicted  business  health  __all__
Actual
business         14       6       20
health            0       0        0
__all__          14       6       20


Overall Statistics:

Accuracy: 0.7
95% CI: (0.45721081772371086, 0.88106840959427235)
No Information Rate: ToDo
P-Value [Acc > NIR]: 0.608009812201
Kappa: 0.0
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                 business health
Population                                    20     20
P: Condition positive                         20      0
N: Condition negative                          0     20
Test outcome positive                         14      6
Test outcome negative                          6     14
TP: True Positive                             14      0
TN: True Negative                              0     14
FP: False Positive                             0      6
FN: False Negative                             6      0
TPR: (Sensitivity, hit rate, recall)         0.7    NaN
TNR=SPC: (Specificity)                       NaN    0.7
PPV: Pos Pred Value (Precision)                1      0
NPV: Neg Pred Value                            0      1
FPR: False-out                               NaN    0.3
FDR: False Discovery Rate                      0      1
FNR: Miss Rate                               0.3    NaN
ACC: Accuracy                                0.7    0.7
F1 score                               0.8235294      0
MCC: Matthews correlation coefficient        NaN    NaN
Informedness                                 NaN    NaN
Markedness                                     0      0
Prevalence                                     1      0
LR+: Positive likelihood ratio               NaN    NaN
LR-: Negative likelihood ratio               NaN    NaN
DOR: Diagnostic odds ratio                   NaN    NaN
FOR: False omission rate                       1      0
scls
  • 16,591
  • 10
  • 44
  • 55
  • What, how did you get this to work? With the latest pandas_ml it's giving me a blank confusion matrix (all 0's), and the labels are True/False instead of business and health. – wordsforthewise Feb 01 '18 at 21:00
  • the same, it is blank – Elham Nov 06 '18 at 16:33
  • 3
    I am getting AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' with scikit-learn version 0.23.1 and pandas-ml version 0.6.1. I have tried other versions as well with no luck. – petra Jul 06 '20 at 13:48
  • How did you turn the x-axis labels diagonal? – Jürgen K. May 20 '21 at 14:03
  • https://github.com/pandas-ml/pandas-ml/blob/26717cc33ddc3548b023a6410b2235fb21a7b382/pandas_ml/confusion_matrix/abstract.py#L235 – scls May 21 '21 at 15:44
13
    from sklearn.metrics import confusion_matrix
    import seaborn as sns
    import matplotlib.pyplot as plt
    model.fit(train_x, train_y,validation_split = 0.1, epochs=50, batch_size=4)
    y_pred=model.predict(test_x,batch_size=15)
    cm =confusion_matrix(test_y.argmax(axis=1), y_pred.argmax(axis=1))  
    index = ['neutral','happy','sad']  
    columns = ['neutral','happy','sad']  
    cm_df = pd.DataFrame(cm,columns,index)                      
    plt.figure(figsize=(10,6))  
    sns.heatmap(cm_df, annot=True)

Confusion matrix

Rahul Verma
  • 2,988
  • 2
  • 11
  • 26
9

There is a very easy way to do this using ConfusionMatrixDisplay. It supports display_labels which can be used to display labels for plot

import numpy as np
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
np.random.seed(0)
y_true = np.random.randint(0,3, 100)
y_pred = np.random.randint(0,3, 100)

labels = ['cat', 'dog', 'rat']

cm = confusion_matrix(y_true, y_pred)
ConfusionMatrixDisplay(cm, display_labels=labels).plot()
#plt.savefig("Confusion_Matrix.png")

Output:

enter image description here

Ref: ConfusionMatrixDisplay

Edit 1:

To changes the X-axis labels to vertical position (needed when class labels are overlapping in the plot) and also plotting directly from predictions.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
np.random.seed(0)

n = 10
y_true = np.random.randint(0,n, 100)
y_pred = np.random.randint(0,n, 100)

labels = [f'class_{i+1}' for i in range(n)]

fig, ax = plt.subplots(figsize=(15, 15))
ConfusionMatrixDisplay.from_predictions(
    y_true, y_pred, display_labels=labels, xticks_rotation="vertical",
    ax=ax, colorbar=False, cmap="plasma")

Output: enter image description here

mujjiga
  • 16,186
  • 2
  • 33
  • 51
  • This helped a lot! Thank you! Is there a way to represent the X-axis labels in a vertical fashion? I am using a multiclass evaluation so my labels are getting overlapped with one another when I want to show the confusion matrix. – raiyan22 Nov 22 '22 at 08:25
  • Could you tell how can I set the label on the top of the figure? it was ax.xaxis.tick_top() in seaborn but now I see no way to do it in the official scikitlearn page. Thank you :) – raiyan22 Nov 29 '22 at 09:12
  • @raiyan22 `ax.xaxis.tick_top()` should work. Call just after creating `subplots` – mujjiga Nov 30 '22 at 08:10
3

Given model, validx, validy. With great help from other answers, this is what fits my needs.

sklearn.metrics.plot_confusion_matrix

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(26,26))
sklearn.metrics.plot_confusion_matrix(model, validx, validy, ax=ax, cmap=plt.cm.Blues)
ax.set(xlabel='Predicted', ylabel='Actual', title='Confusion Matrix Actual vs Predicted')
BSalita
  • 8,420
  • 10
  • 51
  • 68
0
classifier = svm.SVC(kernel="linear", C=0.01).fit(X_train, y_train)
disp = ConfusionMatrixDisplay.from_estimator(
       classifier,
       X_test,
       y_test,
       display_labels=class_names,
       cmap=plt.cm.Blues,`enter code here`
       normalize=normalize,
)
    
disp.ax_.set_title(title) # this line is your answer
    
plt.show()