Lately, I've been working with a risk model in Python.
My target variable is a frequency (float), which was calculated as
freq = number of events/total exposition
and my dataset is really unbalanced, so that the frequency distribution can be approximated to a Zero Inflated Poisson distribution (~98% of my target equals zero).
I've dealt with all of my features properly, one hot encoding the categorical ones (besides guaranteeing all categories make sense and have relevance) and scaling the numeric ones.
Still, I've been trying to fit a Zero Inflated Poisson GLM from statsmodel, which has been failing to converge:
from statsmodels.discrete.count_model import ZeroInflatedPoisson
model = ZeroInflatedPoisson(endog=y_train,
exog=X_train_prep,
exog_infl=X_train_prep)
modelzip = model.fit()
modelzip.summary()
The error I get is:
ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.937949
Iterations: 35
Function evaluations: 37
Gradient evaluations: 37
I have no idea what's happening, since checking the model summary, I can see that most of my features coefficients have relevant p-values. The same convergence issue happens when I use the regular Generalized Poisson class from statsmodel: from statsmodels.discrete.discrete_model import GeneralizedPoisson
.
Another weird thing is that when I fit a Poisson using the class in statsmodel.api, I get no conversion problem (though, it's not what I want to use, since it's a regular Poisson, instead of a Zero Inflated):
import statsmodels.api as sm
model = sm.GLM(y_train,
X_train_prep,
family=sm.families.Poisson())
Does anyone know why I get this problem only with the statsmodels.discrete.count_model
class? And also, is there another Zero Inflated Poisson class in statsmodel apart from the one from statsmodels.discrete.count_model
, which is not converging?
Thank you!