In the function predict.merMod
of the lme4
package, what is the difference between the following arguments: allow.new.levels=TRUE
, re.form=NA
and re.form=~0
if we have only a random intercept?

- 29
- 1
- 7
-
I edited my answer a bit in response to your edit, but it doesn't make much difference. If you don't understand my answer, perhaps you could clarify/focus your question? – Ben Bolker Apr 14 '21 at 21:24
1 Answers
re.form
: ...ifNA
or~0
, include no random effects.
In other words, either of these choices makes predictions for all observations (or sets of predictors specified in newdata
) at the population level, setting all random effects to zero.
allow.new.levels
: logical if new levels (orNA
values) innewdata
are allowed. IfFALSE
(default), such new values innewdata
will trigger an error; ifTRUE
, then the prediction will use the unconditional (population-level) values for data with previously unobserved levels (orNA
s).
In other words, population-level predictions are made only (assuming re.form
is not NA
or ~0
) for observations/sets of predictor values where the random-effect grouping variable is NA
or a level that did not occur in the original data set used to fit. (If only a subset of the grouping variables in a model with multiple of grouping variables are set to NA
/new values, only the random effects corresponding to those grouping variables will be set to zero [this detail is only relevant if there is more than one random-effect term in the model].)

- 211,554
- 25
- 370
- 453
-
So in my case, I think that the setting of the three arguments is equivalent since I have only a random intercept and I don't have the projection level in my original data. Also when I set re.form=NULL I found better performance in the cross-validation than setting re.form=NA but my problem is I don't have the levels of the future prediction in my original data. – user1988 Apr 15 '21 at 16:37
-
That sounds about right. Of course you will have better performance if you have access to group-specific information and use it (i.e. `re.form=NULL`) ... – Ben Bolker Apr 16 '21 at 00:59
-
What I'm thinking to do is to get benefit of this. Use the argument re.form=NULL for the cross-validation and for the future projections (I have raster files) defining a constant year for example: predict(Raster.stack, GLMM, const=(data.frame(Year=2011))). What do you think about this approach? – user1988 Apr 16 '21 at 15:08
-
1I think that's silly if the projections are for future (unknown) years. You should do the cross-validation for the same scenario for which you *actually* want to predict, i.e. `re.form=NA`. You should also be careful about your cross-validation i.e. consider block cross-validation (see Roberts et al. “Cross-Validation Strategies for Data with Temporal, Spatial, Hierarchical, or Phylogenetic Structure.” *Ecography*, December 1, 2016, n/a-n/a. https://doi.org/10.1111/ecog.02881) – Ben Bolker Apr 16 '21 at 18:41