Using proportions as an offset in a binomial (glm) model

Question

I am trying to test the effect of a treatment on the proportions of juveniles in a population of migrating birds. The birds were counted and identified as juveniles or adults daily, but the treatment was only on every second day. Days without treatment were used as a control. The problem is that the proportion of juveniles in the population is expected to be affected not only by the treatment, but also by migration phenology. For example, it is possible that on a given day more juveniles migrated to the study area, and therefor this, and not only the treatment, affected the proportion of juveniles in the population. To account for this problem, I also checked the proportion of juveniles every day at a close by site which was not affected by the treatment (i.e., control site). Hence, I have two types of controls. To analyze the data, I thought of using a binomial GLMM, with the proportion of juveniles as the variable of interest, the treatment as a categorical (with or without treatment) explanatory variable and day as a random-intercepts factor, and I use weights to account for the different number of birds in each day, but I am not sure how to input the data from the control site. From what I read, it should be used as an offset, but I am not sure exactly how.

Is the link function affected by the fact it (juveniles prop. at the ctrl. site) is a proportion? Is it better to use a the juveniles prop. at the ctrl. site in an interaction instead of offset (i.e., ~ Treatment* Juv.prop.cntrl.site)?

This is the model I have so far, but I am not sure if it makes sense, especially if the offset is set correctly:

glm(Juv.prop.exp.site ~ Treatment + Day, offset = Juv.prop.cntrl.site, weights = Tot.birds.exp.site, data = df, family = Binomial)

Where Juv.prop.exp.site is the number of juveniles divided by the total at this site (juveniles + adults) See the data here: DATA (day starts at 11, because during the first 10 days no birds of that species were observed)

score 1 · Answer 1 · answered Jan 01 '23 at 14:00

Normally, I would suggest that questions regarding statistical analysis are migrated to CrossValidated, where you will get better answers to purely statistical questions. However, in your case, it will help a lot to reshape your data into a tidy format before analysis, which is more of a programming problem.

Essentially, you need one column each for day, site, treatment, number of juveniles, and number of adults. I am assuming that in your data, "V" is the treatment and "X" is the control.

library(tidyverse)

df <- data %>%
  select(1, 2, 4, 5, 8, 9) %>%
  rename_all(~gsub("\\.site", "_site", .x)) %>%
  pivot_longer(1:4, names_sep = "\\.", names_to = c(".value", "Site")) %>%
  mutate(Treatment = ifelse(Site == "Exp_site", Treatment, "X")) %>%
  mutate(Treatment = ifelse(Treatment == "V", "Treatment", "Control")) %>%
  mutate(Site = ifelse(Site == "Exp_site", "Experimental", "Control")) %>%
  rename(Juveniles = Juv, Adults = Ad) %>%
  select(2, 1, 3:5)

This makes your data look like this, and to my mind this is easier to analyse (and to reason about):

df
#> # A tibble: 100 x 5
#>      Day Treatment Site         Juveniles Adults
#>    <int> <chr>     <chr>            <int>  <int>
#>  1    11 Control   Experimental         1      0
#>  2    11 Control   Control              0      0
#>  3    12 Treatment Experimental         2      1
#>  4    12 Control   Control              1      0
#>  5    13 Control   Experimental         2      0
#>  6    13 Control   Control              1      1
#>  7    14 Treatment Experimental         6      3
#>  8    14 Control   Control              4      2
#>  9    15 Control   Experimental         6      4
#> 10    15 Control   Control              1      2
#> # ... with 90 more rows
#> # i Use `print(n = ...)` to see more rows

You can then perform a binomial glm like this, with Treatment and Site as independent variables.

model <- glm(cbind(Juveniles, Adults) ~ Treatment + Site, 
             data = df, family = binomial)

summary(model)
#> Call:
#> glm(formula = cbind(Juveniles, Adults) ~ Treatment + Site, family = binomial, 
#>     data = df)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -3.4652  -0.6971   0.0000   0.7895   2.9541  
#> 
#> Coefficients:
#>                    Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)          1.0059     0.1461   6.886 5.74e-12 ***
#> TreatmentTreatment   0.3012     0.2877   1.047    0.295    
#> SiteExperimental    -0.1632     0.2598  -0.628    0.530    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 118.16  on 88  degrees of freedom
#> Residual deviance: 117.07  on 86  degrees of freedom
#> AIC: 244.13
#> 
#> Number of Fisher Scoring iterations: 4

The way I organized the data was to allow using it as an offset (this is a programming problem - I don't know how to specify it in r but indeed also a statistical question about the link function). There is no interaction between "treatment" or "site". Nor an offset. I'd be happy to know why, because I don't really understand the presented model in respect to my question, THANKS — Yael Lehnardt, Jan 02 '23 at 17:32
@YaelLehnardt the Intercept in the above model means that in the control site, the log odds of a bird being a juvenile is 1.0059. This means the probability is 0.732. The log odds of a bird being a juvenile in the control group at the experimental site is 1.0059 - 0.1632, meaning a probability of 0.699. The effect of treatment was to increase the log odds at the experimental site by 0.3012, to a probability of 0.758. There _can't_ be an interaction in treatment and site if the treatment only applies at a single site. — Allan Cameron, Jan 02 '23 at 18:12
Thank you very much for your time and effort, I see now what you've meant regarding the interaction. This suggested model still doesn't fully address the variability in juveniles ratio at the control site along the season (i.e. migration waves) but rather treats it as a permanent value. If there was no seasonal variation, the control site was not needed (only the control days at trt site). Is there a way to account for this variability using offset in binom. model? — Yael Lehnardt, Jan 03 '23 at 07:31

Using proportions as an offset in a binomial (glm) model

1 Answers1