1

I need to calculate coefficients of a multiple logistic regression using sklearn:

X =

x1          x2          x3   x4         x5    x6
0.300000    0.100000    0.0  0.0000     0.5   0.0
0.000000    0.006000    0.0  0.0000     0.2   0.0
0.010000    0.678000    0.0  0.0000     2.0   0.0
0.000000    0.333000    1.0  12.3966    0.1   4.0
0.200000    0.005000    1.0  0.4050     1.0   0.0
0.000000    0.340000    1.0  15.7025    0.5   0.0
0.000000    0.440000    1.0  8.2645     0.0   4.0
0.500000    0.055000    1.0  18.1818    0.0   4.0

The values of y are categorical in range [1; 4].

y =

1
2
1
3
4
1
2
3

This is what I do:

import pandas as pd
from sklearn import linear_modelion
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

h = .02

logreg = linear_model.LogisticRegression(C=1e5)

logreg.fit(X, y)

# print the coefficients
print(logreg.intercept_)
print(logreg.coef_)

However, I get 6 columns in the output of logreg.intercept_ and 6 columns in the output of logreg.coef_ How can I get 1 coefficient per feature, e.g. a - f values?

y = a*x1 + b*x2 + c*x3 + d*x4 + e*x5 + f*x6

Also, probably I am doing something wrong, because y_pred = logreg.predict(X) gives me the value of 1 for all rows.

Markus
  • 3,562
  • 12
  • 48
  • 85
  • 1
    If you are doing multi-class logistic regression you are going to have a coefficient per class/variable combination. So you won't have 1 coefficient per feature, you will have `n` coefficients per feature where `n` is your number of classes. – Peter Jan 29 '18 at 19:01
  • @Xochipilli: So, what should I do if I want to get a formula that I mentioned in my question (I refer to coefficients `a-f`)? I cannot use multiple linear regression, because `y` is categorical. – Markus Jan 29 '18 at 19:03
  • "because y_pred = logreg.predict(X) gives me the value of 1 for all rows":-- Maybe the data is not enough to fit and differentiate between the classes. – Vivek Kumar Jan 30 '18 at 02:23

1 Answers1

3

Check the online documentation:

coef_ : array, shape (1, n_features) or (n_classes, n_features)

Coefficient of the features in the decision function.

coef_ is of shape (1, n_features) when the given problem is binary.

As @Xochipilli has already mentioned in comments you are going to have (n_classes, n_features) or in your case (4,6) coefficients and 4 intercepts (one for each class)

Probably I am doing something wrong, because y_pred = logreg.predict(X) gives me the value of 1 for all rows.

yes, you shouldn't try to use data that you've used for training your model for prediction. Split your data into training and test data sets, train your model using train data set and check it's accuracy using test data set.

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • what should I do if I want to get a formula that I mentioned in my question (I refer to coefficients a-f)? I cannot use multiple linear regression, because y is categorical. – Markus Jan 29 '18 at 19:29
  • @Markus, you already have 6 coefficients for each of your 4 classes – MaxU - stand with Ukraine Jan 29 '18 at 19:31
  • So, I have 4 formulas instead of 1 formula. Do I miss something? I need 6 coefficients, not 6x4. Probably I should use multiple linear regression, but it is not very suitable for categorical `y`. – Markus Jan 29 '18 at 19:32
  • 1
    @Markus, the nature of logistic regression is to predict `0` or `1` (it's binary) so yes, you will have a set of coeffcients for __each__ class – MaxU - stand with Ukraine Jan 29 '18 at 19:35
  • Regarding your last comment in the answer, I expect that `y_pred` should be even more accurate on `X_train`. So, I don't see the correlation of incorrect values of `y_pred` with `X`. I would understand if I got bad results for `X_test`. – Markus Jan 29 '18 at 19:35