2

I have a dataset with both numeric and categorical variables, which I would like to include in a generalized mixed model. When I do so, the ouptut of the conditional model always "forgets" one category.

For example, in this model I include the proportion of vigilance on the total time of detected per video as response variable, and as explanatory variables: urine intensity (numeric), treatment (0 for no urine, 1 for urine), diel_period (dawn, dusk, night, day), sex (Male, Female, Undefined), height (of trees, numeric). And my 50 cameras as a random grouping effect (1 to 50).

bBI_mod8 <- glmmTMB(cbind(vigilance, total_time_behaviour - vigilance) ~ 
                    urine_intensity_heatmap + treatment + diel_period + sex + height + (1|camera),
                ziformula = ~1, data = df_behaviour, family = "betabinomial")

The vigilance proportion follows a zero-inflated beta binomial regression.

summary(bBI_mod8)

When I show the output, I observe:

 Family: betabinomial  ( logit )
Formula:          cbind(vigilance, total_time_behaviour - vigilance) ~ urine_intensity_heatmap +  
    treatment + diel_period + sex + height + (1 | camera)
Zero inflation:                                                      ~1
Data: df_behaviour

     AIC      BIC   logLik deviance df.resid 
  2973.8   3037.1  -1474.9   2949.8     1439 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 camera (Intercept) 0.1583   0.3979  
Number of obs: 1451, groups:  camera, 50

Overdispersion parameter for betabinomial family (): 1.85 

Conditional model:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)             -0.907429   0.471376  -1.925 0.054222 .  
urine_intensity_heatmap -0.009844   0.004721  -2.085 0.037034 *  
treatment1              -0.219403   0.154396  -1.421 0.155304    
diel_periodDay          -0.337329   0.235033  -1.435 0.151218    
diel_periodDusk         -0.543771   0.285322  -1.906 0.056675 .  
diel_periodNight        -0.553826   0.274879  -2.015 0.043925 *  
sexMale                 -0.772731   0.168350  -4.590 4.43e-06 ***
sexUndefined            -1.010425   0.271876  -3.716 0.000202 ***
height                   0.001713   0.012352   0.139 0.889681    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.6685     0.4298  -1.556     0.12

My problem is, as you can see, for my categorical variables, there is always one category that is omitted:

treatment1 but not treatment0

diel_periodDay, diel_periodDusk, diel_periodNight but not diel_periodDawn

sexMale, sexUndefined but not sexFemale

How can I solve this problem? Or how can I show a completer output?

Peter
  • 11,500
  • 5
  • 21
  • 31
Charlotte
  • 21
  • 2
  • Each model has a reference level. In this case (using sex), female is the reference level so there the estimates are compared to the reference level. – NelsonGon Apr 07 '21 at 10:41

1 Answers1

0

In the output of generalized linear models, the estimates shown are what the effect is compared to the reference level. Unless specified, reference levels will be automatically selected based on alphabetical order.

In the above summary, using sex as an example, the estimate you see for example for sexMale, is the effect of being Male compared to being female. For treatment, that is what the effect is compared to treatment0. For diel, the same logic applies.

You can override this by manually setting the reference level to what you prefer. As is, your reference levels is Female, treatment0, diel_periodDawn based solely on alphabetical order.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57