3

I'm running sklearn.linear_model.LogisticRegression on a multi-class problem. From what I understand, the output of the coef_ attribute are the coefficients for each feature for each class. What I don't understand is the interpretation in sklearn. For example, in SPSS you would have one class as the base and then interpret the odds in relation to that class, so you'd actually get the coefficients for n-1 classes. That is not the case in sklearn, where I get coefficients for each class.

Example exponentiated coefficients for one feature (for four classes) are:

1.1649 | 1.0660 | 0.9589 | 0.8607

Is this interpretation correct: with one unit value increase for this feature the probability of that instance belonging in the first class increases by ~16%, then by ~7% in second class, and decreases for third and fourth classes?

Also, how can I calculate the p-value for the coefficients?

arikuja
  • 31
  • 2
  • 1
    Note that `LogisticRegression` doesn't really compute multinomial LR (except in the still to-be-released scikit-learn 0.16, since I contributed it, and then still it requires setting a flag). Instead, it fits `n_classes` binary LR models and normalizes the probabilities. This is a hack that works fine for predictive purposes, but if your interest is modeling and p-values, maybe scikit-learn isn't the toolkit for you. – Fred Foo Nov 04 '14 at 20:23
  • Larsmans, I'm trying to compare the coefficients from scikit to the coefficients from Matlab's mnrfit (a multinomial logistic regression function). Are you saying that scikit will not be able to provide comparable coefficients? Also, you say that scikit may not be the right toolkit. What toolkit would you recommend instead? Thanks. – StemOner Jul 17 '15 at 15:42

0 Answers0