on the predict.merMod function arguments

Question

In the function predict.merMod of the lme4 package, what is the difference between the following arguments: allow.new.levels=TRUE, re.form=NA and re.form=~0 if we have only a random intercept?

I edited my answer a bit in response to your edit, but it doesn't make much difference. If you don't understand my answer, perhaps you could clarify/focus your question? — Ben Bolker, Apr 14 '21 at 21:24

Ben Bolker · Answer 1 · 2021-04-14T21:24:23.963

0

re.form: ...if NA or ~0, include no random effects.

In other words, either of these choices makes predictions for all observations (or sets of predictors specified in newdata) at the population level, setting all random effects to zero.

allow.new.levels: logical if new levels (or NA values) in newdata are allowed. If FALSE (default), such new values in newdata will trigger an error; if TRUE, then the prediction will use the unconditional (population-level) values for data with previously unobserved levels (or NAs).

In other words, population-level predictions are made only (assuming re.form is not NA or ~0) for observations/sets of predictor values where the random-effect grouping variable is NA or a level that did not occur in the original data set used to fit. (If only a subset of the grouping variables in a model with multiple of grouping variables are set to NA/new values, only the random effects corresponding to those grouping variables will be set to zero [this detail is only relevant if there is more than one random-effect term in the model].)

edited Apr 14 '21 at 21:24

answered Apr 14 '21 at 21:05

Ben Bolker

211,554
25
370
453

So in my case, I think that the setting of the three arguments is equivalent since I have only a random intercept and I don't have the projection level in my original data. Also when I set re.form=NULL I found better performance in the cross-validation than setting re.form=NA but my problem is I don't have the levels of the future prediction in my original data. – user1988 Apr 15 '21 at 16:37
That sounds about right. Of course you will have better performance if you have access to group-specific information and use it (i.e. `re.form=NULL`) ... – Ben Bolker Apr 16 '21 at 00:59
What I'm thinking to do is to get benefit of this. Use the argument re.form=NULL for the cross-validation and for the future projections (I have raster files) defining a constant year for example: predict(Raster.stack, GLMM, const=(data.frame(Year=2011))). What do you think about this approach? – user1988 Apr 16 '21 at 15:08
1

I think that's silly if the projections are for future (unknown) years. You should do the cross-validation for the same scenario for which you *actually* want to predict, i.e. `re.form=NA`. You should also be careful about your cross-validation i.e. consider block cross-validation (see Roberts et al. “Cross-Validation Strategies for Data with Temporal, Spatial, Hierarchical, or Phylogenetic Structure.” *Ecography*, December 1, 2016, n/a-n/a. https://doi.org/10.1111/ecog.02881) – Ben Bolker Apr 16 '21 at 18:41

on the predict.merMod function arguments

1 Answers1