1

I would like to generate odds-ratios or coefficients for various features in my dataset along with their 95% confidence intervals using a logistic regression model.

Since we cannot generate 95% CI values for odds-ratios or coefficients in sklearn logistic regression models, I started to play with statsmodels.

However, I am not seeing any standard errors for the coefficients in my output using a very large dataset that contains 17 dummy coded categorical features and 1 outcome variable - with modest correlation seen for only a couple of features (Person’s r < 0.45).

My code follows below:

import statsmodels.api as sm

X_atr = sm.add_constant(X_atr) #add constant for intercept
logit_model = sm.Logit(y_atr, X_atr) #Create model instance
result = logit_model.fit(method = "bfgs") #Fit model

print(result.summary()) #print results

Here is a sample of my output. I am getting the coefficients - but without their standard errors or 95% CI values. Can somebody suggest how to fix this issue? enter image description here

veg2020
  • 956
  • 10
  • 27
  • 1
    try increasing maxiter, see also https://stackoverflow.com/questions/32926299/how-to-fix-statsmodel-warning-maximum-no-of-iterations-has-exceeded – StupidWolf Jun 04 '20 at 11:40
  • Thank you @StupidWolf. Per your suggestion, I have now set maxiter = 100. This now showed the result as "converged" but still does not show the standard errors with the coefficients. So I think my issue is not related to convergence? As an FYI, I have passed my entire X matrix and y (outcome) variable in my data to statsmodels - does this affect calculation of the standard errors somehow? – veg2020 Jun 04 '20 at 13:40
  • 1
    it should not... I think there's something weird with X_atr or y_atr. I ran your code with an example dataset, it works ok. can you share a subset of the data somehow? – StupidWolf Jun 04 '20 at 14:10
  • OK, thank you @StupidWolf. I will try later today delete some features and retry this analysis. – veg2020 Jun 04 '20 at 14:11
  • 1
    hmmm ok i think the issue is this. If one of your features, is constant or has very odd values, the std err cannot be estimated. now this will propagate to all your variables, causing what you see – StupidWolf Jun 04 '20 at 14:13

0 Answers0