8

I am plotting a confusion matrix for a multiple labelled data, where labels look like:

label1: 1, 0, 0, 0

label2: 0, 1, 0, 0

label3: 0, 0, 1, 0

label4: 0, 0, 0, 1

I am able to classify successfully using the below code. I only need some help to plot confusion matrix.

    for i in range(4):
        y_train= y[:,i]
        print('Train subject %d, class %s' % (subject, cols[i]))
        lr.fit(X_train[::sample,:],y_train[::sample])
        pred[:,i] = lr.predict_proba(X_test)[:,1]

I used the following code to print confusion matrix, but it always return a 2X2 matrix

prediction = lr.predict(X_train)

print(confusion_matrix(y_train, prediction))
Ajit Medhekar
  • 1,018
  • 1
  • 10
  • 39
tourist
  • 101
  • 1
  • 1
  • 8

7 Answers7

16

I found a function that can plot the confusion matrix which generated from sklearn.

import numpy as np


def plot_confusion_matrix(cm,
                          target_names,
                          title='Confusion matrix',
                          cmap=None,
                          normalize=True):
    """
    given a sklearn confusion matrix (cm), make a nice plot

    Arguments
    ---------
    cm:           confusion matrix from sklearn.metrics.confusion_matrix

    target_names: given classification classes such as [0, 1, 2]
                  the class names, for example: ['high', 'medium', 'low']

    title:        the text to display at the top of the matrix

    cmap:         the gradient of the values displayed from matplotlib.pyplot.cm
                  see http://matplotlib.org/examples/color/colormaps_reference.html
                  plt.get_cmap('jet') or plt.cm.Blues

    normalize:    If False, plot the raw numbers
                  If True, plot the proportions

    Usage
    -----
    plot_confusion_matrix(cm           = cm,                  # confusion matrix created by
                                                              # sklearn.metrics.confusion_matrix
                          normalize    = True,                # show proportions
                          target_names = y_labels_vals,       # list of names of the classes
                          title        = best_estimator_name) # title of graph

    Citiation
    ---------
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

    """
    import matplotlib.pyplot as plt
    import numpy as np
    import itertools

    accuracy = np.trace(cm) / float(np.sum(cm))
    misclass = 1 - accuracy

    if cmap is None:
        cmap = plt.get_cmap('Blues')

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()

    if target_names is not None:
        tick_marks = np.arange(len(target_names))
        plt.xticks(tick_marks, target_names, rotation=45)
        plt.yticks(tick_marks, target_names)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]


    thresh = cm.max() / 1.5 if normalize else cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if normalize:
            plt.text(j, i, "{:0.4f}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
        else:
            plt.text(j, i, "{:,}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")


    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
    plt.show()

It will look like this enter image description here

Calvin Duy Canh Tran
  • 1,078
  • 11
  • 16
1

This works the best for me :

from sklearn.metrics import multilabel_confusion_matrix
y_unique = y_test.unique()
mcm = multilabel_confusion_matrix(y_test, y_pred, labels = y_unique)
mcm
0

I see this is still an open issue in sklearn's repository:

https://github.com/scikit-learn/scikit-learn/issues/3452

However there have been some attempts at implementing it. From the same #3452 thread issue:

https://github.com/Magellanea/scikit-learn/commit/514287c1d5dad2f0ab4918dc4da5cf7053fe6734#diff-b04acd877dd793f28ae7be13a999ed88R187

You can check the code proposed in the function and see if that fits your needs.

Guiem Bosch
  • 2,728
  • 1
  • 21
  • 37
  • I replaced confusion_matrix with multilabel_confusion_matrix, it gives an error that name 'multilabel_confusion_matrix' is not defined. Is there a workaround to this problem? The issue seems open on Github. – tourist Aug 19 '16 at 08:30
  • Exactly, as I said: " still an open issue". I just gave you the links to the code in case you wanted to try to use it. But it's not in sklearn's code, that's why it says it's not defined. If you want to used (I didn't try it) you should include all the code from the `multilabel_confusion_matrix` in your own code and call the function. Be careful because this was an open question from 2014 and the fact it is still an open question maybe suggets its not a trivial issue. I just gave you a pointer in case you wanted to try it by yourself and maybe solve it on your own. Good luck! – Guiem Bosch Aug 20 '16 at 06:56
0
from sklearn.metrics import multilabel_confusion_matrix

mul_c = multilabel_confusion_matrix(
    test_Y,
    pred_k,
    labels=["benign", "dos","probe","r2l","u2r"])
mul_c
zaplec
  • 1,681
  • 4
  • 23
  • 51
Govardhan
  • 1
  • 1
  • 1
    While this code may provide a solution to the question, it's better to add context as to why/how it works. This can help future users learn, and apply that knowledge to their own code. You are also likely to have positive feedback from users in the form of upvotes, when the code is explained. – borchvm May 05 '20 at 06:48
0

I found an easy solution with sklearn and seaborn libraries.

from sklearn.metrics import confusion_matrix, classification_report
from matplotlib import pyplot as plt
import seaborn as sns

def plot_confusion_matrix(y_test,y_scores, classNames):
    y_test=np.argmax(y_test, axis=1)
    y_scores=np.argmax(y_scores, axis=1)
    classes = len(classNames)
    cm = confusion_matrix(y_test, y_scores)
    print("**** Confusion Matrix ****")
    print(cm)
    print("**** Classification Report ****")
    print(classification_report(y_test, y_scores, target_names=classNames))
    con = np.zeros((classes,classes))
    for x in range(classes):
        for y in range(classes):
            con[x,y] = cm[x,y]/np.sum(cm[x,:])

    plt.figure(figsize=(40,40))
    sns.set(font_scale=3.0) # for label size
    df = sns.heatmap(con, annot=True,fmt='.2', cmap='Blues',xticklabels= classNames , yticklabels= classNames)
    df.figure.savefig("image2.png")

classNames = ['A', 'B', 'C', 'D', 'E'] 
plot_confusion_matrix(y_test,y_scores, classNames) 
#y_test is your ground truth
#y_scores is your predicted probabilities
Amin Ullah
  • 77
  • 9
0

Just use pandas with gradient coloring:

cm = confusion_matrix(y_true, y_pred)
cm = pd.DataFrame(data=cm, columns = np.unique(y_true), index = np.unique(y_true))
cm = (cm / cm.sum(axis = 1).values.reshape(-1,1))  # to fractions of 1
cm.style.background_gradient().format(precision=2)

By now pandas has nice options for table formatting and decoration.

Poe Dator
  • 4,535
  • 2
  • 14
  • 35
0

Another simple aproach, using seaborn`s heatmap supported with pandas dataframe

confusion_matrix = metrics.confusion_matrix(y_true=y_test, 
                                            y_pred=y_test_pred)
mc_df = pd.DataFrame(confusion_matrix,
                     index=model.classes_, 
                     columns=columns)
sns.heatmap(mc_df, annot =True, fmt="d",cmap=plt.get_cmap('Blues'))
plt.title("Confusion Matrix")

enter image description here