0

I have a fairly complicated ZINB model. I have tried to replicate the basic structure of what I'm trying to do:

MyDat<-cbind.data.frame(fac1 = rep(c("A","B","C","D"),10), 
       fac2=c(rep("X",20),rep("Y",20)), 
       offset=c(runif(20, 50,60),runif(20,150,165)), 
       fac3=rep(c(rep("a1",4),rep("a2",4),rep("a3",4),rep("a4",4),rep("a5",4)),2),
       Y=c(0,0,0,1,0,0,11,10,0,0,0,5,0,0,0,35,60,0,0,0,0,2,0,0,16,0,0,0,0,0,3,88,0,0,0,0,0,0,27,0))

f<-formula(Y~fac1+ offset(log(offset))|fac3+ fac2) 
ZINB <-zeroinfl(f, dist = "negbin",link = "logit", data = MyDat)
summary(ZINB)

The primary goal of this model is to look at the effect of fac1 across the four levels. The other variables are more just artifacts of the sampling process.

Here is the output:

Call:
zeroinfl(formula = f, data = MyDat, dist = "negbin", link = "logit")

Pearson residuals:
      Min        1Q    Median        3Q       Max 
-0.418748 -0.338875 -0.265109 -0.001566  2.682920 

Count model coefficients (negbin with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7192     0.9220  -1.865 0.062239 .  
fac1B        -4.4161     1.4700  -3.004 0.002663 ** 
fac1C        -1.2008     1.2896  -0.931 0.351778    
fac1D         0.1928     1.3003   0.148 0.882157    
Log(theta)   -1.7349     0.4558  -3.806 0.000141 ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.5899   210.8434  -0.055    0.956
fac3a2       -0.4775     2.4608  -0.194    0.846
fac3a3      -11.2284   427.5200  -0.026    0.979
fac3a4       10.7771   210.8056   0.051    0.959
fac3a5       -0.3135     2.3358  -0.134    0.893
fac2Y        11.8292   210.8298   0.056    0.955
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 0.1764 
Number of iterations in BFGS optimization: 76 
Log-likelihood: -63.82 on 11 Df

I have consulted papers and stats books and forums, but I'm still not sure how to present this information. What I really want is a bar plot showing the effects on the Y-axis and the 4 levels on the X.

If I understand correctly, level A of fac1 is currently set at 0, and is my reference level (please correct me if I'm wrong here). So, I can make a plot of the 4 levels (including level A as zero). This doesn't seem ideal. I would really like to have 95%CIs for all levels.

I can also use the predict function, however predict.zeroinfl does not give error estimates, and I'm unsure how to interpret the effect of the offset.

Similar papers have just put a boxplot of the original data next to a boxplot of the predictions and compared. I feel like I should be able to do better.

Below is the code and plot to create the predicted values:

MyDat$phat<-predict(ZINB, type="response")
MyDat$phat_os<-MyDat$phat/MyDat$offset

plot(phat~fac1, MyDat)

Predictions plot

Is bootstrapping the way to go? I have tried this and run into all kinds of trouble for something I'm not sure is necessary.

Thank you in advance, and please go easy on me if I'm making a silly oversight/assumption. I'm still learning, but these stats feel a bit out of my reach.

CAS
  • 1
  • 3

1 Answers1

1

For starters, you can plot the model coefficients with their confidence intervals. The arm package has the coefplot function, but it doesn't have a method for zeroinfl models, so I've created a simple coefficient plot below using ggplot2. The predict method for zeroinfl models doesn't provide confidence intervals for predictions, but this answer to a question on CrossValidated shows how to construct bootstrapped confidence intervals for zeroinfl models.

Regarding the levels of fac1: A is the reference level, so the coefficients for the other levels are relative to fac1 = "A".

library(pscl)
library(ggplot2)

MyDat<-cbind.data.frame(fac1 = rep(c("A","B","C","D"),10), 
                       fac2=c(rep("X",20),rep("Y",20)), 
                       offset=c(runif(20, 50,60),runif(20,150,165)), 
                       fac3=rep(c(rep("a1",4),rep("a2",4),rep("a3",4),rep("a4",4),rep("a5",4)),2),
                       Y=c(0,0,0,1,0,0,11,10,0,0,0,5,0,0,0,35,60,0,0,0,0,2,0,0,16,0,0,0,0,0,3,88,0,0,0,0,0,0,27,0))

f<-formula(Y ~ fac1 + offset(log(offset))|fac3 + fac2) 
ZINB <-zeroinfl(f, dist = "negbin",link = "logit", data = MyDat)

# Extract coefficients and standard errors from model summary
coefs = as.data.frame(summary(ZINB)$coefficients$count[,1:2])
names(coefs)[2] = "se" 
coefs$vars = rownames(coefs)

# Coefficient plot
ggplot(coefs, aes(vars, Estimate)) + 
  geom_hline(yintercept=0, lty=2, lwd=1, colour="grey50") +
  geom_errorbar(aes(ymin=Estimate - 1.96*se, ymax=Estimate + 1.96*se), 
                lwd=1, colour="red", width=0) +
  geom_errorbar(aes(ymin=Estimate - se, ymax=Estimate + se), 
                lwd=2.5, colour="blue", width=0) +
  geom_point(size=4, pch=21, fill="yellow") +
  theme_bw()

And here's what the plot looks like.

enter image description here

Community
  • 1
  • 1
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Thanks for taking the time to answer my question! If I'm understanding you correctly, if I want CIs on fac1A bootstrapping is necessary? – CAS Apr 14 '16 at 01:12