It is difficult to know what you are asking.
From ?MuMIn::dredge
"Use of na.action = "na.omit" (R's default) or "na.exclude" in global.model must be avoided, as it results with sub-models fitted to different data sets, if there are missing values. Error is thrown if it is detected."
In your example, leaving the default options(na.action = na.omit)
works fine:
options()$na.action
mod.na.omit <- glmer(formula = pr ~ yr + soil_dist + sla_raw +
yr:soil_dist + yr:sla_raw + (1|plot) + (1|subplot),
data = coldat,
family = binomial)
But, options(na.action = na.fail)
causes glmer
to fail (as expected from the documentation).
If you look at the length of the data in coldat
, complete cases of coldat
, mod.na.omit
you get the following:
> # number of rows in coldat
> nrow(coldat)
[1] 3171
> # number of complete cases in coldat
> nrow(coldat[complete.cases(coldat), ])
[1] 2551
> # number of rows in data included in glmer model when using 'na.omit'
> length(mod.na.omit@frame$pr)
[1] 2551
From the example data you provided, complete cases of coldat
and the rows of coldat
included by glmer
when using na.omit
(mod.na.omit@frame
) yields the same number of rows, but it is conceivable that as predictors are added, this may no longer be the case (i.e., number of rows in mod.na.omit@frame
> complete cases of coldat
). In this scenario (as the documentation states), there is a risk of sub-models being fitted to different data sets as dredge
generates the models. So, rather than potentially fitting sub-models, dredge
takes a conservative approach to NA
, and throws an error.
So, you basically either have to remove the incomplete cases (which you indicated is something you don't want to do) or interpolate the missing values. I typically avoid interpolation if there are large blocks of missing data which make estimating a value fraught, and remove incomplete cases instead.