I think model frame is returned as a protection against non-standard evaluation.
Let's look at a small example.
dat <- data.frame(x = runif(10), y = rnorm(10))
FIT <- lm(y ~ x, data = dat)
fit <- FIT; fit$model <- NULL
What is the difference between
model.frame(FIT)
model.frame(fit)
?? Checking methods(model.frame)
and stats:::model.frame.lm
shows that in the first case, model frame is efficiently extracted from FIT$model
; while in the second case, it will be reconstructed from fit$call
and model.frame.default
. Such difference also results in the difference between
# depends on `model.frame`
model.matrix(FIT)
model.matrix(fit)
as model matrix is built from a model frame. If we dig further, we will see that these are different, too,
# depends on `model.matrix`
predict(FIT)
predict(fit)
# depends on `predict.lm`
plot(FIT)
plot(fit)
Note that this is where the problem could be. If we deliberately remove dat
, we can not reconstruct the model frame, then all these will fail:
rm(dat)
model.frame(fit)
model.matrix(fit)
predict(fit)
plot(fit)
while using FIT
will work.
This is not bad enough. The following example under non-standard evaluation is really bad!
fitting <- function (myformula, mydata, keep.mf = FALSE) {
b <- lm(formula = myformula, data = mydata, model = keep.mf)
par(mfrow = c(2,2))
plot(b)
predict(b)
}
Now let's create a data frame again (we have removed it earlier)
dat <- data.frame(x = runif(10), y = rnorm(10))
Can you see that
fitting(y ~ x, dat, keep.mf = TRUE)
works but
fitting(y ~ x, dat, keep.mf = FALSE)
fails?
Here is a question I answered / investigated a year ago: R - model.frame() and non-standard evaluation It was asked for survival
package. That example is really extreme: even if we provide newdata
, we would still get error. Retaining the model frame is the only way to proceed!
Finally on your observation of memory costs. In fact, $model
is not mainly responsible for potentially large lm
object. $qr
is, as it has the same dimension with model matrix. Consider a model with lots of factors, or nonlinear terms like bs
, ns
or poly
, the model frame is much smaller compared with model matrix. So omitting model frame return does not help reduce lm
object size. This is actually one motivation that biglm
is developed.
Since I inevitably mentioned biglm
, I would emphasis again that this method only helps reducing the final model object size, not RAM usage during model fitting.