0

Lately, I've been working with a risk model in Python.

My target variable is a frequency (float), which was calculated as freq = number of events/total exposition and my dataset is really unbalanced, so that the frequency distribution can be approximated to a Zero Inflated Poisson distribution (~98% of my target equals zero).

I've dealt with all of my features properly, one hot encoding the categorical ones (besides guaranteeing all categories make sense and have relevance) and scaling the numeric ones.

Still, I've been trying to fit a Zero Inflated Poisson GLM from statsmodel, which has been failing to converge:

from statsmodels.discrete.count_model import ZeroInflatedPoisson

model = ZeroInflatedPoisson(endog=y_train,
                            exog=X_train_prep,
                            exog_infl=X_train_prep)


modelzip = model.fit()
modelzip.summary()

The error I get is:

ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals

Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.937949
         Iterations: 35
         Function evaluations: 37
         Gradient evaluations: 37

I have no idea what's happening, since checking the model summary, I can see that most of my features coefficients have relevant p-values. The same convergence issue happens when I use the regular Generalized Poisson class from statsmodel: from statsmodels.discrete.discrete_model import GeneralizedPoisson.

Another weird thing is that when I fit a Poisson using the class in statsmodel.api, I get no conversion problem (though, it's not what I want to use, since it's a regular Poisson, instead of a Zero Inflated):

import statsmodels.api as sm
model = sm.GLM(y_train, 
               X_train_prep,
               family=sm.families.Poisson())

Does anyone know why I get this problem only with the statsmodels.discrete.count_model class? And also, is there another Zero Inflated Poisson class in statsmodel apart from the one from statsmodels.discrete.count_model, which is not converging?

Thank you!

  • try maxiter=1000 in fit. The default maxiter looks much too small for this case. If there are more problems, you could try different optimization methods. – Josef Jun 16 '23 at 15:59
  • I've already tried both of your suggestions and I still get the ConvergenceWarning – Lídia Gusmão Jun 23 '23 at 19:22
  • Is your target variable in the model frequencies or counts? Count model like Poisson or ZIP require counts (integers or data on the nonnegative real line). – Josef Jun 23 '23 at 20:17

0 Answers0