GLS / GLM nested design with autocorrelation over time

Question

Still fairly new to GLM and a bit confused about how to establish my model.

About my project:

I sampled the microbiome (and measured a diversity index value = Shannon) from the root system of a sample of 9 trees (=tree1_cat).

In each tree I sampled fine and thick roots (=rootpart) and each tree was sampled four times (=days) over the course of one season. Thus I have a nested design but have to keep time in mind for autocorrelation. Also not all values are present, thus I have a few missing values). So far I have tried and tested the following:

Model <- gls(Shannon ~ tree1_cat/rootpart + tree1_cat + days,  
     na.action = na.omit, data = psL.meta, 
     correlation = corAR1(form =~ 1|days), 
     weights = varIdent(form= ~ 1|days))

Furthermore I've tried to get more insight and used anova(Model) to get the p-values of those factors. Am I allowed to use those p-values? Also I've used emmeans(Model, specs = pairwise ~ rootpart)for pairwise comparisons but since rootpart was entered as nested factor it only gives me the paired interactions.

It all works, but I am not sure, whether this is the right model! Any help would be highly appreciated!

do you really only have two trees? i.e. a total of 2 (trees) x 2 (fine vs thick) vs 4 (days) = 16 observations in your data set? If so, you need to keep your statistical approach **very** simple. I might suggest that you start by asking about possible approaches on [CrossValidated](https://stats.stackexchange.com), and come back to Stack Overflow when you're sure you know what you want to do ... — Ben Bolker, Mar 09 '22 at 01:45
Thanks for the reply. No I have 9 trees. So it would be 9 (trees) x 2 (fine vs thick) x 4 (days) = 81 measurements — mfbeuq, Mar 09 '22 at 08:31

Ben Bolker · Answer 1 · 2022-03-09T18:28:26.940

It would be helpful to know your scientific question, but let's suppose you're interested in differences in Shannon diversity between fine and thick roots and in time trends. A model you could use would be:

library(lmerTest)
lmer(Shannon ~ rootpart*days + (rootpart*days|tree1_cat), data = ...)

The fixed-effect component rootpart*days can be expanded into 1 + rootpart + days + rootpart:days (where 1 signifies the intercept)

intercept: SD in fine roots on day 0 (hopefully that's the beginning of the season)
rootpart: difference between fine and thick roots on day 0
days: change per day in SD in fine roots (slope)
rootpart:days difference in slope between thick roots and fine roots

The random-effect component (rootpart*days|tree1_cat) measures how all four of these effects vary across trees, and their correlations (e.g. do trees with a larger-than-average difference between fine and thick roots on day 0 also have a larger-than-average change over time in fine root SD?)

This 'maximal' random effects model is almost certainly too complex for your data; a rough rule of thumb says you should have 10-20 data points per parameter estimated, the fixed-effect model takes 4 parameters. A full model with 4 random effects requires the estimate of a 4×4 covariance matrix, which has (4*5)/2 = 10 parameters all by itself. I might just try (1+days|tree1_cat) (random slopes) or (rootpart|tree_cat) (among-tree difference in fine vs. thick differences), with a bias towards allowing for the variation in the effect that is your primary interest (e.g. if your primary question is about fine vs. thick then go with (rootpart|tree_cat).

I probably wouldn't worry about autocorrelation at all, nor heteroscedasticity by day (your varIdent(~1|days) term) unless those patterns are very strongly evident in the data.

If you want to allow for autocorrelation you'll need to fit the model with nlme::lme or glmmTMB (lmer still doesn't have machinery for autocorrelation models); something like

library(nlme)
lme(Shannon ~ rootpart*days,
    random = ~days|tree1_cat,
    data = ...,
    correlation = corCAR1(form = ~days|tree1_cat)
)

You need to use corCAR1 (continuous-time autoregressive order-1) rather than the more common corAR1 for unevenly sampled data. Be aware that lme is more finicky/worse at dealing with singular models, so you may discover you need to simplify your model before you can actually get this model to run.

Thanks so much for the detailed answer. Indeed I'm trying to find whether time, the root section (and less of a focus) the tree individual has an impact on the alpha diversity. I forgot to mention that the days where the samples were taken are not in equal distances to each other, so two timepoints are more closer to each other than to the other two. The thing why I am also interested in autocorrelation is that a I have second experiment with a similar experimental design where I have only one rootpart but twelve timepoints with large differences between their sampling dates (as above) — mfbeuq, Mar 09 '22 at 18:15

GLS / GLM nested design with autocorrelation over time

1 Answers1