1

I am fitting several mixed models using the lmer function in the package lme4, each with the same fixed effects and random effects, but different response variables. The purpose of these models is to determine how environmental conditions influence different fruit and seed traits in a particular tree species. I want to know which traits respond most strongly to which environmental variables, and how well the variation in each trait is captured by each model overall.

The data have been collected from several sites, and several trees within each site.

Response variables: measures of fruits and seeds, e.g. fresh mass, dry mass, volume

Fixed effects: temperature, rainfall, soil nitrogen, soil phosphorus

Random effects: Sites, trees

Example of model notation I have been using:

lmer(fruit.mass ~ temperature + rainfall + soil N + soil P + 
                  (1|site/tree), data = fruit)

My problem: some of the models run fine with no detectable issues, however, some produce a singular fit where the estimated variance for 'site' = 0.

I know there is considerable debate around dealing with singular fit models, although one approach is to drop site and keep the tree level random effect. The models run fine after this.

My question: Should I then drop the site random effect from the models which weren't singular if I want to compare these models in anyway? If so, are there certain methods for comparing model performance more suited to this situation?

If this is covered in publications or discussion threads then any links would be much appreciated.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453

2 Answers2

0

When the model converges to a singular fit, it indicates that the random structure is overfitted. Therefore I would argue that it does not make sense to compare these. I would also be concerned about multiple testing in this situation.

Robert Long
  • 5,722
  • 5
  • 29
  • 50
0

I would drop models that give "boundary is singular" warnings. The rest are fine and you can compare among them. Of note: It is best to specify REML=FALSE when comparing models (can give references if need be). Once the "best model" is selected you can run it normally (i.e. with REML). I would recommend using conditional AIC (cAIC4 package), for example anocAIC(model1, model2 ...). Another good one is the performance package. It has many options, such as: model_performance, check_model....