glm.nb non-convergence and other options for negative binomial regression

Question

I'm trying to implement a glm with a negative binomial distribution in R and have a few questions. Here are the data I'm working with - note that my predictors are all centred and scaled using 2 standard deviations.

df <- structure(list(offset_var = c(1.28599085563929, 0.56011798945881, 
0.910539373132146, 1.40150012655384, 1.99332290786881, 0.431352036569322, 
1.9664387276399, 0.39501204122542, 0.837952086514098, 1.49309174747163, 
0.185176379335315, 0.340780160418832, 0.23261768703236, 1.9358463846208, 
0.683228994093252, 0.766384544663353, 2.36951602293399, 0.570222733062088, 
2.035225147398, 2.66190465894079, 2.59441609615926, 2.45851192989865, 
9.10131475225225, 1.79085721952421, 1.14341199635633, 1.86464965904907, 
0.969684569635742, 1.48586083003075, 0.398395739771472, 1.57819408350658, 
0.348242838034098, 1.01779798183851, 1.56020949397414, 1.14211413937976, 
1.15472189286642, 2.17372502761175, 2.17372502761175, 1.15472189286642, 
0.91480376034087, 0.52428786649856, 0.663761139444733, 1.85426680323653, 
0.799109081286816, 1.2592920835499, 1.32010595330908, 1.9473416892704, 
1.74079702185659, 1.92731761020335), response_var = c(31, 154, 
40, 71, 3, 43, 66, 180, 62, 27, 84, 81, 66, 14, 40, 35, 48, 155, 
23, 37, 33, 88, 38, 8, 73, 29, 11, 12, 74, 70, 298, 40, 24, 35, 
0, 27, 0, 116, 83, 60, 77, 68, 56, 8, 64, 234, 57, 71), pred1 = c(0.593253968253968, 
0.212619047619048, 0.208888888888889, 0.824603174603175, 0.791031746031746, 
0.096984126984127, 0.589523809523809, 0.096984126984127, 0.783571428571429, 
0.156666666666667, 0, 0.361904761904762, 0.164126984126984, 0.723809523809524, 
0.123095238095238, 0.671587301587302, 0.708888888888889, 0.186507936507937, 
0.77984126984127, 0.0820634920634921, 0.220079365079365, 1, 0.27984126984127, 
0.835793650793651, 0.526031746031746, 0.0298412698412698, 0.641746031746032, 
0.0783333333333333, 0.156666666666667, 0.791031746031746, 0.167857142857143, 
0.641746031746032, 0.817142857142857, 0.126825396825397, 0.160396825396825, 
0.776031746031746, 0.776031746031746, 0.160396825396825, 0.130555555555556, 
0.600714285714286, 0.123095238095238, 0.70515873015873, 0.164126984126984, 
0.492460317460317, 0.0932539682539683, 0.596984126984127, 0.134285714285714, 
0.899206349206349), pred3 = c(0.779445727482678, 0.779445727482678, 
0.981524249422633, 0.981524249422633, 0.431293302540416, 0.431293302540416, 
0.85796766743649, 0.85796766743649, 0.974018475750578, 0.974018475750578, 
0.215357967667437, 0.215357967667437, 0.767898383371826, 0.767898383371826, 
0.929561200923788, 0.929561200923788, 0.319861431870669, 0.319861431870669, 
0.930138568129331, 0.930138568129331, 1, 1, 0.753464203233256, 
0.753464203233256, 0.937644341801385, 0.937644341801385, 0.438799076212472, 
0.438799076212472, 0.960161662817552, 0.960161662817552, 0.852193995381064, 
0.852193995381064, 0.803117782909932, 0.803117782909932, 0.571593533487299, 
0.571593533487299, 0.571593533487299, 0.571593533487299, 0, 0, 
0.801385681293304, 0.801385681293304, 0.656466512702079, 0.656466512702079, 
0.852771362586606, 0.852771362586606, 0.203810623556583, 0.203810623556583
), pred6 = c(0.688053824835413, 0.540952466204905, 0.628399305213738, 
0.6881894451542, 0.780506864063453, 0.596192469087064, 0.664351792193705, 
0.601630037911551, 0.725643181043039, 0.617634868802872, 0.811784880984014, 
0.804339479465123, 0.576349468957153, 0.847678636187369, 0.625702671539389, 
0.719554647550091, 0.363763777714112, 0.446477292582477, 0.474674252997289, 
0.480035185899025, 0.4937661725622, 0, 0.513379111687292, 0.903418101749096, 
0.695930304479286, 0.617335579129072, 0.759590008590574, 0.654198263624173, 
0.638351745840282, 0.758157124611394, 0.616963963666818, 0.749652691590418, 
0.626922408721354, 0.587580912470206, 0.430636110291636, 0.349235608213707, 
0.349235608213707, 0.430636110291636, 0.723455668440276, 1, 0.615287637567703, 
0.696660949537828, 0.570881476208878, 0.51486263616898, 0.599650096633513, 
0.661291018750272, 0.44995045321675, 0.317335721700076), pred2 = c(0.0640028225107189, 
0.0640028225107189, 1, 1, 0.0467149443570715, 0.0467149443570715, 
0.955928169494161, 0.955928169494161, 0.991149754525286, 0.991149754525286, 
0.636541945977623, 0.636541945977623, 0.0695282460368243, 0.0695282460368243, 
0.945044759518499, 0.945044759518499, 0.666620820800469, 0.666620820800469, 
0.865751343981534, 0.865751343981534, 0.158030700783964, 0.158030700783964, 
0, 0, 0.941696017987526, 0.941696017987526, 0.194526003575978, 
0.194526003575978, 0.974286448958601, 0.974286448958601, 0.182153599598151, 
0.182153599598151, 0.834117696305023, 0.834117696305023, 0.729828317197582, 
0.729828317197582, 0.729828317197582, 0.729828317197582, 0.477488683047594, 
0.477488683047594, 0.914726688872012, 0.914726688872012, 0.0551346373492319, 
0.0551346373492319, 0.950905057197701, 0.950905057197701, 0.653584648412039, 
0.653584648412039), pred4 = c(0.521197167238095, 0.521197167238095, 
0.675165075565519, 0.675165075565519, 0.754750741994489, 0.754750741994489, 
0.656214157700682, 0.656214157700682, 0.779025156922262, 0.779025156922262, 
0.466656174770789, 0.466656174770789, 1, 1, 0.19412522736884, 
0.19412522736884, 0, 0, 0.914127451104004, 0.914127451104004, 
0.450341983837599, 0.450341983837599, 0.200288723846985, 0.200288723846985, 
0.51864603534987, 0.51864603534987, 0.632983879774839, 0.632983879774839, 
0.219950187016786, 0.219950187016786, 0.378371968866928, 0.378371968866928, 
0.332552996854781, 0.332552996854781, 0.543573394076179, 0.543573394076179, 
0.543573394076179, 0.543573394076179, 0.110505257947263, 0.110505257947263, 
0.80446053190608, 0.80446053190608, 0.942739062050578, 0.942739062050578, 
0.769130578963835, 0.769130578963835, 0.910376608809312, 0.910376608809312
), pred5 = c(1, 1, 0.277346201087263, 0.277346201087263, 0.977777461589862, 
0.977777461589862, 0.690563309054654, 0.690563309054654, 0.439008770827643, 
0.439008770827643, 0.819995545023047, 0.819995545023047, 0.579039846892373, 
0.579039846892373, 0.365538938318013, 0.365538938318013, 0.0971685384896833, 
0.0971685384896833, 0.599718772091419, 0.599718772091419, 0.871837464542416, 
0.871837464542416, 0.0550210151699806, 0.0550210151699806, 0.740706048133834, 
0.740706048133834, 0.0826796777514591, 0.0826796777514591, 0.52020534667614, 
0.52020534667614, 0, 0, 0.660904950317592, 0.660904950317592, 
0.790027991312289, 0.790027991312289, 0.790027991312289, 0.790027991312289, 
0.206354045553633, 0.206354045553633, 0.600235537123976, 0.600235537123976, 
0.829243121962118, 0.829243121962118, 0.595320097430252, 0.595320097430252, 
0.17138558268267, 0.17138558268267)), row.names = c(NA, -48L), class = "data.frame")

First off, I tried running the model using the glm.nb function in the MASS package, but kept getting non-convergence warnings (glm.fit: algorithm did not converge) even after increasing the number of iterations beyond the default 25 (I tried 50, 100, 250, 1000, and even 5000):

> glm1 <- glm.nb(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, control = glm.control(maxit = 5000))
There were 50 or more warnings (use warnings() to see the first 50) #these are the non-convergence warnings
> summary(glm1)

Call:
glm.nb(formula = response_var ~ pred1 * pred2 + pred1 * pred3 + 
    pred1 * pred4 + pred1 * pred5 + pred6 + offset(log(offset_var)), 
    data = df, control = glm.control(maxit = 5000), init.theta = 0.6779942761, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8944  -0.9971  -0.4541   0.2178   1.9029  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   7.3937     1.2057   6.132 8.65e-10 ***
pred1        -5.2640     2.0354  -2.586   0.0097 ** 
pred2        -0.9040     0.8356  -1.082   0.2793    
pred3        -0.6444     1.0470  -0.616   0.5382    
pred4        -1.1811     1.1321  -1.043   0.2968    
pred5        -0.2583     1.0225  -0.253   0.8006    
pred6        -0.6567     1.0222  -0.642   0.5206    
pred1:pred2   2.0800     1.5741   1.321   0.1864    
pred1:pred3   1.7663     2.0589   0.858   0.3910    
pred1:pred4   0.6586     2.1055   0.313   0.7544    
pred1:pred5   0.4524     2.0225   0.224   0.8230    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.791) family taken to be 1)

    Null deviance: 68.201  on 47  degrees of freedom
Residual deviance: 57.762  on 37  degrees of freedom
AIC: 541.63

Number of Fisher Scoring iterations: 5000


              Theta:  0.678 
          Std. Err.:  0.124 
Warning while fitting theta: alternation limit reached 

 2 x log-likelihood:  -517.634

I know I am trying to fit a complex model given my sample size, so I am assuming this is why the model is not converging (I want to fit such a complex model because I am going to use it as the global model upon which to perform model averaging). However, I then tried alternative packages that can fit glms with negative binomial distributions in R. When I tried glmmTMB and brglm2the models converged and I got the following (very similar) results:

library(glmmTMB)
library(brglm2)

> #fit model using glmmTMB
> glm2 <- glmmTMB(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, family = nbinom2)
> summary(glm2)
 Family: nbinom2  ( log )
Formula:          response_var ~ pred1 * pred2 + pred1 * pred3 + pred1 * pred4 +      pred1 * pred5 + pred6 + offset(log(offset_var))
Data: df

     AIC      BIC   logLik deviance df.resid 
   538.8    561.3   -257.4    514.8       36 


Dispersion parameter for nbinom2 family (): 0.815 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.09772    1.36899   5.185 2.16e-07 ***
pred1       -5.42650    2.23604  -2.427   0.0152 *  
pred2       -1.10202    0.79743  -1.382   0.1670    
pred3       -0.39245    1.14313  -0.343   0.7314    
pred4       -1.33843    1.32065  -1.013   0.3108    
pred5       -0.07125    1.09954  -0.065   0.9483    
pred6       -0.17567    1.03227  -0.170   0.8649    
pred1:pred2  2.80613    1.69120   1.659   0.0971 .  
pred1:pred3  1.04996    2.04619   0.513   0.6079    
pred1:pred4  1.15544    2.37031   0.487   0.6259    
pred1:pred5 -0.10083    2.31516  -0.044   0.9653    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> #fit model using brglm2
> glm_3 <- brnb(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, link = "log", type = "ML", transformation = "identity")
> summary(glm_3)

Call:
brnb(formula = response_var ~ pred1 * pred2 + pred1 * pred3 + pred1 * pred4 + pred1 * pred5 + pred6 + offset(log(offset_var)), data = df, link = "log", type = "ML", 
    transformation = "identity")


Coefficients (mean model with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.09773    1.18882   5.970 2.37e-09 ***
pred1       -5.42651    2.00939  -2.701  0.00692 ** 
pred2       -1.10203    0.82354  -1.338  0.18084    
pred3       -0.39243    1.03189  -0.380  0.70372    
pred4       -1.33854    1.11593  -1.199  0.23034    
pred5       -0.07116    1.00813  -0.071  0.94373    
pred6       -0.17567    1.00911  -0.174  0.86180    
pred1:pred2  2.80608    1.55187   1.808  0.07058 .  
pred1:pred3  1.05001    2.03198   0.517  0.60534    
pred1:pred4  1.15563    2.07750   0.556  0.57803    
pred1:pred5 -0.10104    1.99736  -0.051  0.95965    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter  with identity transformation function: 1.2267971)

    Null deviance: 81.222  on 47  degrees of freedom
Residual deviance: 57.153  on 37  degrees of freedom
AIC: 538.825

Type of estimator: ML (maximum likelihood)
Number of quasi-Fisher scoring iterations:32

My questions are:

Why would the model not converge using glm.nb but converge using the other two packages? Should I be suspicious of the results from glmmTMB or brglm2 given the issues with convergence in using glm.nb?
glmmTMB was by far the fastest so I'd like to use that package going forward. However I am wondering: given that glmmTMB is a package for generalized linear mixed modelling, is there any issue with fitting a model in glmmTMB without random effects? Or would the only tradeoff be the slightly higher standard errors for the coefficients fit using glmmTMB compared with those from brglm2?

To contribute to your confusion, there's also the `VGAM::vglm(..., family=negbinomial())` you might want to consider. — jay.sf, May 30 '22 at 14:45
Thanks, yes it does add to my confusion a bit as it gives different estimates than the above two packages (I'm assuming that's because it defines the distribution differently than the other packages I've used) but it's good to know it exists! — Conrad, May 30 '22 at 19:24

score 1 · Answer 1 · answered Oct 28 '22 at 18:23

The parameter estimates and related statistics that are produced with glm.nb() should be the same as the one you get from using the glmmTMB() function of the package of the same name using only the same fixed effects. However, I have just came across a situation where both the model coefficients and AIC values differed between the two packages. Normally they agree.

A pretty cool thing about glmmTMB is that it allows you to consider alternative extensions of the Poisson family, such as the negative binomial type I (NB1, family=nbinom1), the mean-parameterized Conway-Maxwell-Poisson (family=compois) and the Generalized Poisson (family=genpois). You may want to have a look at these additional error structures to model your (I presume) count data.

Also, have you tried to not use an offset, but rather use this covariate as is and then, for your predictions, choose a constant value (e.g., mean) in the range of values for this covariate?

Good luck.

Thanks for your input! I indeed ended up going with an alternative distribution to model my data. — Conrad, Oct 31 '22 at 20:02

glm.nb non-convergence and other options for negative binomial regression

1 Answers1