What GLM should I use to try and control for effort?

Question

Probably a very simple question, but I am banging my head against the wall with this. To start with I am not very familiar with r or stats beyond the basics. I work for an NGO with an endangered species, so we have very little in the way of resources. I am trying to determine the trend of a population while accounting for the very patchy data.

I have about 20 years of data. Each year, volunteers go to roost sites to count the birds leaving roosts in the morning. There is a lot of variation from year to year regarding which, and how many sites are counted. I have also gathered other metrics that I believe may have an effect on the numbers counted at any one time, such as moon phase (days until/since nearest full moon), cumulative precipitation over previous 1, 3, 6, 12 & 24 months. Along with Year and effort these make up my independent variables.

My understanding is that I should use a GLM to see to what degree each variable effects the dependent variable (total counted), with the idea that I can see whether the population is really increasing rather than the general increase in total counted simply being down to increased effort over the years.

I have played around with r and spent many hours googling and it seems that GLM was the right way to go, but I struggled to find what model best described the relationship. I was then introduced to negative binomials (rather than a quasi-poisson) GLM that produced an AIK that would tell me about the best fit.

I was then introduced to the MumIn dredge function that basically shows me that the top results with a delta of less than 2 are the best descriptors.

My problem now is that I am so far down a rabbit hole that I don't understand at all, I don't know if I'm even looking at the right information any more. So to start with the basics, is a negative binomial the right GLM to use in my case?

> dput(head(totals))
    structure(list(Year = c(2002L, 2003L, 2005L, 2006L, 2007L, 2008L
), Total = c(433L, 627L, 141L, 714L, 609L, 429L), Effort = c(10L, 
13L, 14L, 25L, 27L, 21L), Rain.24 = c(957.45, 867.23, 1408.05, 
1634.91, 1127.47, 859.42), Rain.12 = c(426.52, 440.71, 878.8, 
756.11, 371.36, 488.06), Rain.6 = c(321.72, 272.84, 639.16, 542, 
250.71, 395.59), Rain.3 = c(157.94, 65.25, 437.35, 351.1, 86.94, 
129.66), Rain.1 = c(27.94, 8.74, 99.3, 70.79, 25.8, 21.05), Nearest.full.moon = c(2L, 
7L, 4L, 14L, 6L, 4L)), row.names = c(NA, 6L), class = "data.frame")

My dependant variable is "tot" with the following distribution:

> tot
 [1]  290  433  870  277  714  669  429  479  860  547  654  865  845 1085  883  583 1023 1097 1182
[20]  945
> skewness(tot)
[1] -0.1372469
> var(tot)
[1] 73319.84
> mean(tot)
[1] 736.5

Here is the code that I am running for the main chunk of the analysis:

    options(na.action=na.omit)
    model3 <- glm.nb( Total ~ . , data = totals)
    summary(model3)
    options(na.action=na.fail)
    dredge(model3)
    res <- dredge(model3, trace=2)
    subset(res, delta <= 2, recalc.weights=FALSE)
    options(na.action=na.omit)
    summary(model.avg(res, revised.var=FALSE))
    importance(res)

Sorry if this make no sense, I will try and edit the question accordingly with some feedback.

Hello Jack Haines. Welcome to Stack Overflow. First thing, please could you share some data? Just 6 rows will be ok. If your dataset's name is dataset, please type in the console `dput(head(dataset))` then copy the result and paste it in the question. That can help us to help you. — Manu, Feb 22 '21 at 21:51
@Manu Thanks for the welcome. I have added the data as requested. Thanks for the help. — Jack, Feb 22 '21 at 22:15
Hello @Jack, after reading your question and seeing what you are trying to do, I think the first step to know if glm.nb is the right choice, the first thing to do is seeing the distribution of the dependent variable, if it's positively skewed. Then calculate the variance and mean. Are they equal or almost? I think I can help you but more data is needed, because glm.nb requires many observations to work. — Manu, Feb 24 '21 at 02:02
@Manu Thank you very much for your willingness to help. I have updated the question again with what I think are the answers to your question. — Jack, Feb 24 '21 at 16:02
Hello @Jack, as I see it, if you apply `hist(tot)` the distribution is not adequate to nb or poisson. It's not precisely my domain of knowledge, but I don't see it positively skewed in great magnitude. Have you tried to plot tot vs each variable? `plot(Total,rain.24)` — Manu, Feb 24 '21 at 16:56
Hi @Manu I see what you mean, do you have any idea what this means in terms of running the GLMs? Should I be using a different GLM all together? I have plotted all variables but I don't think that really helps me answer whether we are indeed seeing an increase in numbers or not as 3 months rainfall and effort look to also be heavily correlated. For what its worth, having ran the dredge part of the model, the output makes a lot of sense, but I wouldn't want to use that if its not the correct process to use in the first place. sorry if I'm not making much sense, I really do appreciate the help. — Jack, Feb 26 '21 at 16:34

What GLM should I use to try and control for effort?

0 Answers0