My zero inflation regression makes random NaNs

Question

I'm looking at diversity of bees between different places - inside vs outside apple orchards (Placement), between farms, and between cultivars. All three variables are working fine for honeybees, but for all the solitary ("wildbees") and for some of the bumblebees i get NaNs where I'm supposed to see the results. How can I fix this? Do I simply have too few datapoints? I have just under 500 traps, where I've found ~500 honeybees, ~150 bumblebees and ~70 solitary bees. Sorry if I don't give enough information to answer my question, it's my first time using a forum like this!

Also, what does theta mean? It's not any of my variables

# This is what happened for inside vs outside


> p1 <- zeroinfl(Honeybee...29 ~ Placement, data = diversity, dist = "negbin")
> summary(p1)

Call:
zeroinfl(formula = Honeybee...29 ~ Placement, data = diversity, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.5872 -0.5872 -0.4080  0.1483 11.2744 

Count model coefficients (negbin with log link):
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    0.20377    0.11021   1.849   0.0645 .  
PlacementWild  0.09042    0.23433   0.386   0.6996    
Log(theta)    -0.73459    0.16043  -4.579 4.67e-06 ***

Zero-inflation model coefficients (binomial with logit link):
              Estimate Std. Error z value Pr(>|z|)
(Intercept)     -9.685     53.494  -0.181    0.856
PlacementWild    9.499     53.489   0.178    0.859
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 0.4797 
Number of iterations in BFGS optimization: 127 
Log-likelihood: -607.6 on 5 Df
> 
> p2 <- zeroinfl(Bumblebee ~ Placement, data = diversity, dist = "negbin")
> summary(p2)

Call:
zeroinfl(formula = Bumblebee ~ Placement, data = diversity, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.3761 -0.3761 -0.3289 -0.3289  5.9230 

Count model coefficients (negbin with log link):
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -1.6686     0.1939  -8.607  < 2e-16 ***
PlacementWild   1.8530     0.3853   4.809 1.52e-06 ***
Log(theta)     -0.5619     0.5234  -1.074    0.283    

Zero-inflation model coefficients (binomial with logit link):
              Estimate Std. Error z value Pr(>|z|)
(Intercept)     -6.747     76.346  -0.088    0.930
PlacementWild    7.366     76.258   0.097    0.923
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 0.5701 
Number of iterations in BFGS optimization: 84 
Log-likelihood: -299.5 on 5 Df
> 
> p3 <- zeroinfl(Wildbee ~ Placement, data = diversity, dist = "negbin")
> summary(p3) 

Call:
zeroinfl(formula = Wildbee ~ Placement, data = diversity, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.3369 -0.3369 -0.3132 -0.3132  7.1499 

Count model coefficients (negbin with log link):
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -1.7629     0.2065  -8.535  < 2e-16 ***
PlacementWild   0.2712     0.2817   0.963    0.336    
Log(theta)     -1.4738     0.2675  -5.510 3.58e-08 ***

Zero-inflation model coefficients (binomial with logit link):
              Estimate Std. Error z value Pr(>|z|)
(Intercept)    -12.886    325.692   -0.04    0.968
PlacementWild   -7.375        NaN     NaN      NaN
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 0.2291 
Number of iterations in BFGS optimization: 18 
Log-likelihood: -246.7 on 5 Df
Warning message:
In sqrt(diag(object$vcov)) : NaNs produced

This is what happened between farms

> f1 <- zeroinfl(Honeybee...29 ~ Farm, data = diversity, dist = "negbin")
> summary(f1)

Call:
zeroinfl(formula = Honeybee...29 ~ Farm, data = diversity, dist = "negbin")

Pearson residuals:
     Min       1Q   Median       3Q      Max 
-0.54910 -0.49323 -0.44511  0.01922  8.19537 

Count model coefficients (negbin with log link):
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)       0.2768     0.1441   1.920   0.0548 .  
FarmFruktgården  -0.4622     0.3009  -1.536   0.1245    
FarmSando        -0.1801     0.2744  -0.656   0.5116    
Log(theta)       -0.9391     0.1857  -5.056 4.27e-07 ***

Zero-inflation model coefficients (binomial with logit link):
                Estimate Std. Error z value Pr(>|z|)
(Intercept)       -8.688     42.152  -0.206    0.837
FarmFruktgården    7.379     42.133   0.175    0.861
FarmSando          6.753     42.126   0.160    0.873
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 0.391 
Number of iterations in BFGS optimization: 32 
Log-likelihood: -612.9 on 7 Df
> 
> f2 <- zeroinfl(Bumblebee ~ Farm|Cultivar, data = diversity, dist = "negbin")
> summary(f2)

Call:
zeroinfl(formula = Bumblebee ~ Farm | Cultivar, data = diversity, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.4019 -0.3099 -0.3099 -0.2951 12.0230 

Count model coefficients (negbin with log link):
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -1.9397     0.2679  -7.241 4.46e-13 ***
FarmFruktgården   0.1666     0.3705   0.450    0.653    
FarmSando         1.4289     0.3324   4.299 1.71e-05 ***
Log(theta)       -1.5096     0.2126  -7.102 1.23e-12 ***

Zero-inflation model coefficients (binomial with logit link):
                  Estimate Std. Error z value Pr(>|z|)
(Intercept)        -13.227    362.819  -0.036    0.971
CultivarDiscovery   -3.861        NaN     NaN      NaN
CultivarSummerred   -5.736   6888.605  -0.001    0.999
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 0.221 
Number of iterations in BFGS optimization: 14 
Log-likelihood: -296.3 on 7 Df

On what theta is https://stats.stackexchange.com/a/115119/87002 and https://stats.stackexchange.com/a/10442/87002 — Phil, Mar 28 '23 at 03:52

score 0 · Answer 1 · answered Mar 28 '23 at 06:17

It seems that the variables with NaN in the standard error, z value and p-value are those being always 0 for some combinations of factors. But it is not possible to check as only a snapshot of the data is provided.

Looking at the warning message that is displayed: In sqrt(diag(object$vcov)) : NaNs produced also points in this direction according to previous StackOverflow posts:

So your question is probably a duplicate.

My zero inflation regression makes random NaNs

1 Answers1