4

My data

I used statsmodels to build a logistic regression as follows:

        X = np.copy(train_data)
        X = sm_.add_constant(X)

        model = sm.Logit(train_y, X)

        result = model.fit(method='bfgs', maxiter=10000)

        p_values[i-1, j-1, :] = result.pvalues
        logistic_Coefficients[i-1, j-1, :] = result.params

But I get the following error and my p-values are all NAN:

C:\Users\maryamr\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\statsmodels\base\model.py:488: HessianInversionWarning: Inverting hessian failed, no bse or cov_params available
  'available', HessianInversionWarning)
C:\Users\maryamr\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
C:\Users\maryamr\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
C:\Users\maryamr\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\scipy\stats\_distn_infrastructure.py:1821: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)

I also tried glm in r but I do not get any error and just one of the features has NAN coefficient and p-value.

MRM
  • 1,099
  • 2
  • 12
  • 29

1 Answers1

4

Based on the first error you received ("Inverting Hessian failed"), this is due to the Statsmodels logistic model's inability to find a maximum of the loglikelihood function and subsequently to find the solution for your data and the sets of dependent and independent variables you are using.

Looking at your data, you have a lot of 0's and identical values, which may be problematic for finding a solution. But since it looks like you obtained convergence in R, you can try changing some of the Statsmodels parameters of the model to see if it helps (or first try to find out what parameters R's glm package used and replicate them with Statsmodels).

For example, the logit.fit method allows you select one of eight different pre-defined optimization methods. 'nm' (Nelder-Mead) is recommended by others for such situations.

You can find Statsmodels Logit documentation here: http://www.statsmodels.org/devel/generated/statsmodels.discrete.discrete_model.Logit.fit.html

You can also post this question on the Cross Validated site, as you may get more responses there.

AlexK
  • 2,855
  • 9
  • 16
  • 27