Binomial negative distribution

Question

I am learning how to use glms to test hypothesis and to see how variables relate among themselves. I am trying to see if the variable tick prevalence (Parasitized individuals/Assessed individuals)(dependent variable) is influenced by the number of captured hosts (independent variable).

My data looks like figure 1.(116 observations).

I have read that one way to know which distribution to use is to see which distribution the dependent variable has. So I built a histogram for the TickPrev variable (figure 2).

I got to the conclusion that the binomial negative distribution would be the best option. Before I ran the analysis, I transformed the TickPrevalence variable (it was a proportion, and the glm.nb only works with integers) applying the following codes:

df <- df %>% mutate(TickPrev=TickPrev*100)
df$TickPrev <- as.integer(df$TickPrev)

Then I applied the glm.nb function from the MASS package, and obtained this summary

summary(glm.nb(df$TickPrev~df$Captures, link=log))

Call:

glm.nb(formula = df15$TickPrev ~ df15$Captures, link = log, init.theta = 1.359186218)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.92226  -0.69841  -0.08826   0.44562   1.70405  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    3.438249   0.125464  27.404   <2e-16 ***
df15$Captures -0.008528   0.004972  -1.715   0.0863 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(1.3592) family taken to be 1)
 
Null deviance: 144.76  on 115  degrees of freedom
Residual deviance: 141.90  on 114  degrees of freedom

AIC: 997.58

Number of Fisher Scoring iterations: 1
              Theta:  1.359 
          Std. Err.:  0.197 

 2 x log-likelihood:  -991.584

I know that the p-value indicates that there isn't enough proves to believe that the two variables are related. However, I am not sure if I used the best model to fit the data and how I can know that. Can you please help me? Also, knowing what I show, is there a better way to see if this variables are related? Thank you very much.

This is more of a statistical question, I vote to migrate it to cross validated. I can briefly answer some of it here — StupidWolf, Jul 25 '21 at 16:10
By converting your prevalence into an integer, you are assuming the number of assessed individuals is the same for all rows. I don't know whether this is true. If you have this information, you should use it — StupidWolf, Jul 25 '21 at 16:11
it is not clear what is the size of your dataset. you should first plot the number of captures vs the prevalence to look at the trend first before bombarding a linear model which assumes it is a linear relationship between number of captures and prevalence — StupidWolf, Jul 25 '21 at 16:13
if tick prevalence is **the number of individuals with at least one tick** (rather than number of ticks per individual [parasite **burden**]), then you will want to use a *binomial* model (or quasi-binomial, or beta-binomial) instead. Are all captured individuals assessed? — Ben Bolker, Jul 25 '21 at 17:13

Binomial negative distribution

0 Answers0