2

I have discrete count data (trap_catch) for two groups withing the variable in_tree (1 = trap in a tree, or 0 = trap not in a tree), and I want to see if counts were different between these two groups. The data is overdispersed and there are many zeroes, so I have come to the conclusion that I need a hurdle model. Is this OK?

trap_id trap_catch in_tree 1 0 0 2 10 1 3 0 0 4 0 1 5 9 1 6 3 0

Here is an example of how the data is set up. My code is as follows:

mod.hurdle <- hurdle(trap_catch~in_tree, data=data,dist="negbin") summary(mod.hurdle)

The results I get are as follows and seem so different to any examples I have read:

Pearson residuals:
   Min      1Q  Median      3Q     Max 
-0.8986 -0.6635 -0.2080  0.2474  6.8513 

Count model coefficients (truncated negbin with log link):
           Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.2582     0.1285   9.793  < 2e-16 ***
in_tree       1.3722     0.3100   4.426 9.58e-06 ***
Log(theta)   -0.2056     0.2674  -0.769    0.442    
Zero hurdle model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)    1.5647     0.1944   8.049 8.32e-16 ***
in_tree      16.0014  1684.1379   0.010    0.992    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Theta: count = 0.8142
Number of iterations in BFGS optimization: 8 
Log-likelihood: -513.7 on 5 Df

I am confused as to how to interpret these results.

I apologise in advance for my lack of understanding - I am very new to this type of analysis.

moth_lady
  • 21
  • 1
  • The standard error in the binomial part of the model is very large, the model might not be converging well. Do the vast majority of catches occur when the trap is in a tree? – Marius Aug 29 '19 at 03:37
  • The two groups are not equal in size. I wonder if it is due to my small sample size for in tree? I have n=15 for in tree and n=185 for not in tree. I originally log-transformed the count data and did a Mann-Whitney U test and found quite a significant difference in trap catch, but was concerned about the zeroes so am looking to fit a zero-inflated model. – moth_lady Aug 29 '19 at 04:04
  • The sample size might contribute, the issue I was getting at was [separation](https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logistic-regression-and-what-are-some-strategies-to-deal-with-the-issue/). The coefficient of 16 or `in_tree` suggests catches are much, much more likely to occur in trees, leading to potential separation in the model. – Marius Aug 29 '19 at 04:08
  • I see, and it does seem likely that quasi-complete separation is occurring. Any ideas how to handle this in R? An article I found claims "The easiest strategy is "Do nothing". This is because that the maximum likelihood for other predictor variables are still valid as we have seen from previous section. The drawback is that we don’t get any reasonable estimate for the variable that predicts the outcome variable so nicely" So can I still extract any useful conclusions from that output? – moth_lady Aug 29 '19 at 04:50

0 Answers0