I have a fairly complicated ZINB model. I have tried to replicate the basic structure of what I'm trying to do:
MyDat<-cbind.data.frame(fac1 = rep(c("A","B","C","D"),10),
fac2=c(rep("X",20),rep("Y",20)),
offset=c(runif(20, 50,60),runif(20,150,165)),
fac3=rep(c(rep("a1",4),rep("a2",4),rep("a3",4),rep("a4",4),rep("a5",4)),2),
Y=c(0,0,0,1,0,0,11,10,0,0,0,5,0,0,0,35,60,0,0,0,0,2,0,0,16,0,0,0,0,0,3,88,0,0,0,0,0,0,27,0))
f<-formula(Y~fac1+ offset(log(offset))|fac3+ fac2)
ZINB <-zeroinfl(f, dist = "negbin",link = "logit", data = MyDat)
summary(ZINB)
The primary goal of this model is to look at the effect of fac1 across the four levels. The other variables are more just artifacts of the sampling process.
Here is the output:
Call:
zeroinfl(formula = f, data = MyDat, dist = "negbin", link = "logit")
Pearson residuals:
Min 1Q Median 3Q Max
-0.418748 -0.338875 -0.265109 -0.001566 2.682920
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7192 0.9220 -1.865 0.062239 .
fac1B -4.4161 1.4700 -3.004 0.002663 **
fac1C -1.2008 1.2896 -0.931 0.351778
fac1D 0.1928 1.3003 0.148 0.882157
Log(theta) -1.7349 0.4558 -3.806 0.000141 ***
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.5899 210.8434 -0.055 0.956
fac3a2 -0.4775 2.4608 -0.194 0.846
fac3a3 -11.2284 427.5200 -0.026 0.979
fac3a4 10.7771 210.8056 0.051 0.959
fac3a5 -0.3135 2.3358 -0.134 0.893
fac2Y 11.8292 210.8298 0.056 0.955
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta = 0.1764
Number of iterations in BFGS optimization: 76
Log-likelihood: -63.82 on 11 Df
I have consulted papers and stats books and forums, but I'm still not sure how to present this information. What I really want is a bar plot showing the effects on the Y-axis and the 4 levels on the X.
If I understand correctly, level A of fac1 is currently set at 0, and is my reference level (please correct me if I'm wrong here). So, I can make a plot of the 4 levels (including level A as zero). This doesn't seem ideal. I would really like to have 95%CIs for all levels.
I can also use the predict function, however predict.zeroinfl does not give error estimates, and I'm unsure how to interpret the effect of the offset.
Similar papers have just put a boxplot of the original data next to a boxplot of the predictions and compared. I feel like I should be able to do better.
Below is the code and plot to create the predicted values:
MyDat$phat<-predict(ZINB, type="response")
MyDat$phat_os<-MyDat$phat/MyDat$offset
plot(phat~fac1, MyDat)
Is bootstrapping the way to go? I have tried this and run into all kinds of trouble for something I'm not sure is necessary.
Thank you in advance, and please go easy on me if I'm making a silly oversight/assumption. I'm still learning, but these stats feel a bit out of my reach.