Interpret predicted probabilities in multiclass logistic regression

Question

I have a dataset as given below where A,B,C,D,E are features and 'T' is Target Variable.

A     B    C     D     E       T
32    22   55    76    98      3
12    41   90    56    33      2
31    78   99    67    89      1
51    85   71    21    37      1
......
......

Now, I have applied multiclass logistic regression classifier using Scikit Learn and get predict values and matrix of probablities as:-

 A     B    C     D     E       T   Predicted    Probablity
32    22   55    76    98       3     3           0.35
12    41   90    56    33       2     1           0.68
31    78   99    67    89       1     3           0.31
51    85   71    21    37       1     1           0.25

Now just want to ask how to I interpret the outcome probablities, 1) As far I have studied that python by default gives the probablity of event to be 1. So if this is the case, is 0.35 considered to be probablity of being event 1? OR 2) is value 0.35 is possibility of 1st case to be belongs from class "3"? How could I calculate probablities for remaining two classes. Something like:-

 A     B    C     D     E       T   Predicted     P_1    P_2    P_3
32    22   55    76    98       3     3           0.35   0.20   0.45
12    41   90    56    33       2     1           0.68   0.10   0.22
31    78   99    67    89       1     3           0.31   0.40   0.29
51    85   71    21    37       1     1           0.25   0.36   0.39

Please include a sample of your code used for model building & getting predictions (normally, `scikit-learn` would return the probabilities for each one of your classes) — desertnaut, Jan 26 '18 at 10:57

score 2 · Answer 1 · answered Jan 28 '18 at 10:06

from sklearn.linear_classifier import LogisticRegression

lr = LogisticRegression(random_state = 1)
lr.fit(x_train,y_train)

We fit our training data.

lr.predict_proba(x_test)

Suppose the dataset contains three classes.The output will be something like:

array([[  2.69011925e-02,   5.40807755e-01,   4.32291053e-01],
   [  9.32525056e-01,   6.73606657e-02,   1.14278375e-04],
   [  5.24023874e-04,   3.24718067e-01,   6.74757909e-01],
   [  4.75066650e-02,   5.86482429e-01,   3.66010906e-01],
   [  1.83396339e-02,   4.77753541e-01,   5.03906825e-01],
   [  8.82971089e-01,   1.16720108e-01,   3.08803089e-04],
   [  4.64149328e-02,   7.17011933e-01,   2.36573134e-01],
   [  1.65574625e-02,   3.29502329e-01,   6.53940209e-01],
   [  8.70375470e-01,   1.29512862e-01,   1.11667567e-04],
   [  8.51328361e-01,   1.48584654e-01,   8.69851797e-05]])

In given output array, each row has 3 columns, showing respective probability for each class. Each row represents a sample.

lr.predict_proba(x_test[0,:]) **OR** lr.predict_proba(x_test)[0,:]

Output:

array([ 0.02690119,  0.54080775,  0.43229105])

i.e probability for that sample.

could you clarify the relationship between the column indices and class labels, I don't think it would be clear for a beginner from what you write. — Matti Lyra, Jan 28 '18 at 10:27

score 2 · Answer 2 · answered Jan 28 '18 at 11:50

Not sure where your results table came from (which API call(s)) but your second hypothesis is correct. In the table below

 A     B    C     D     E       T   Predicted    Probablity
32    22   55    76    98       3     3           0.35
12    41   90    56    33       2     1           0.68
31    78   99    67    89       1     3           0.31
51    85   71    21    37       1     1           0.25

you have the results of what I assume are 4 different samples (instances), with the target variable (correct class), predicted class, and the probability of the predicted class.

I think you have a problem with an indexing routine in your code. Let's focus on the last row

 A     B    C     D     E       T   Predicted    Probablity
51    85   71    21    37       1     1           0.25

The probability of the predicted class is 0.25 or 25% and you have a three class problem. This means that the total probability mass for the other two classes is 1 - 0.25 = 0.75, if you divide that 75% evenly between the remaining two classes (that were supposedly not the prediction of the classifier) you get 0.75 / 2 = 0.375 - or 37.5% probability for both class 2 and 3 (you predicted 1). Of course the classifier won't have equal probability for both 2 and 3 so one will be lower while the other will be higher. The problem is that the 37.5% is already higher than the probability of your predicted class 1, which is not logically possible. If the classifier gives probability 37.5% to class 2 and 25% to class 1 then surely the prediction should be class 2, not class 1 like you have above.

The output from logistic regression is a table of probabilities with a row for each instance and a column for each class, for instance

probs = array([[  2.69011925e-02,   5.40807755e-01,   4.32291053e-01],
   [  9.32525056e-01,   6.73606657e-02,   1.14278375e-04],
   [  5.24023874e-04,   3.24718067e-01,   6.74757909e-01],
   [  8.70375470e-01,   1.29512862e-01,   1.11667567e-04],
   [  8.51328361e-01,   1.48584654e-01,   8.69851797e-05]])

The probability of the 3rd class for the first instance is in the third column of the first row probs[0, 2]. If you want the predicted classes from the array you can do predicted_idx = np.argmax(probs, axis=1), that gives you array([1, 0, 2, 0, 0]) for the above data, which is the column index of the highest predicted probability. You can then extract the probability of only the predicted classes by

probs[range(probs.shape[0]), predicted_idx]
>> array([ 0.54080776,  0.93252506,  0.67475791,  0.87037547, 0.85132836])

Finally, you have to keep in mind that the column index in the results table does not necessarily correspond to the way your data set is indexed. If you use something like sklearn.preprocessing.LabelEncoder it may be the case that the class you thought was at index 0, is not in fact at index 0. You can check this from label_binarizer.classes_ - the order of that array corresponds to column indices in the probability array you get from logistic regression.

Interpret predicted probabilities in multiclass logistic regression

2 Answers2