How can I get prediction probability and predicted labels in one line using "cross_val_predict" method for multi class prediction?

Question

I have written down a code which classifies the multiclass data.

import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.metrics import roc_curve, auc
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import cross_val_predict
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
from itertools import cycle
import pandas as pd 

##########################################################################################
df = pd.read_csv('merged_Zero_Cor_cleaned.tsv',sep='\t')

X = df.drop(columns='class')
y = df['class']

y_bin = label_binarize(y, classes=[0, 1, 2, 3, 4])
n_classes = y_bin.shape[1]

clf = OneVsRestClassifier(QDA())
y_score = cross_val_predict(clf, X, y, cv=10 ,method='predict_proba')
y_pred = cross_val_predict(clf, X, y, cv=10 )

lw = 2

fpr = dict()
tpr = dict()
roc_auc = dict()

for i in range(n_classes):

    df = pd.DataFrame(y_score[:, i])
    df = df.fillna(0)
    fpr[i], tpr[i], _ = roc_curve(y_bin[:, i], df.T.values[0])
    roc_auc[i] = auc(fpr[i], tpr[i])

colors = cycle(['blue', 'red', 'green','black', 'brown'])

for i, color in zip(range(n_classes), colors):
    plt.plot(fpr[i], tpr[i], color=color, lw=lw,
             label='ROC curve of class {0} (area = {1:0.2f})'
             ''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)

plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic for multi-class data')
plt.legend(loc="lower right")
plt.show()

##########################################################################################

In the above code to check the performance, I am calculating the predicted probability score and labels in two different lines.

y_score = cross_val_predict(clf, X, y, cv=10 ,method='predict_proba')
y_pred = cross_val_predict(clf, X, y, cv=10 )

Which is computationally expensive. Is there any way that I can get both the outputs in one line.

Update

Or How can we interpret the class with this probabilities?

      0         1         2         3              4
0      0.0  0.250000  0.250000  0.250000   2.500000e-01
1      0.0  0.000000  0.000000  1.000000   0.000000e+00
2      0.0  0.250000  0.250000  0.250000   2.500000e-01
3      0.0  0.000000  0.333333  0.333333   3.333333e-01
4      0.0  0.000000  0.000000  1.000000   0.000000e+00
5      0.0  0.000000  0.000000  1.000000   8.744693e-23
6      0.0  0.333333  0.333333  0.333333   9.255446e-105

From the docs: "For method=’predict_proba’, the columns correspond to the classes in sorted order." So then you should have an array of predicted probabilities, and the column with the highest probability should match the class predicted by `predict`, so you shouldn't need to also predict the class — G. Anderson, Sep 09 '19 at 16:21
@G.Anderson thanks for your reply I have updated my question can you suggest me how can I interpret the class from the probability score as per your suggestion. — jax, Sep 09 '19 at 18:48
You can use the answers discussed in [this question](https://stackoverflow.com/questions/39256287/how-to-get-classes-labels-from-cross-val-predict-used-with-predict-proba-in-scik) and/or [this question](https://stackoverflow.com/questions/16858652/how-to-find-the-corresponding-class-in-clf-predict-proba) to get the class labels, and `np.argmax()` to get the index of the highest probability in each row — G. Anderson, Sep 09 '19 at 19:46

How can I get prediction probability and predicted labels in one line using "cross_val_predict" method for multi class prediction?

0 Answers0