How to access the source data when manipulating/updating fitted models

Question

I'm trying to write some functions to ease refitting multiple models, but find it painful, as R is unable to locate proper data, when it plunges deeper into evaluation tree. Despite an effort was made to store the formula environment inside the model, I guess there's really no way to unambiguously point to the raw data object. This becomes even harder for fitting survival curves using survfit, where no terms object is being stored inside.

Do I really need to retype the data/formula as a parameter each time?

Example:

# model-fitting wrapper function
fn <- function(fn_formula, fn_data) {
    lm(formula = fn_formula, data = fn_data)
}
# specify exemplary data and formula
data <- data.frame(
    y = rnorm(100),
    x1 = rnorm(100),
    x2 = rnorm(100))
formula <- y ~ x1

# try to create and update the fit with different parameters
fn_fit <- fn(formula, data)
update(fn_fit, ~ x2)
# Error in is.data.frame(data) : object 'fn_data' not found
terms(fn_fit) %>% attr('.Environment')
# <environment: R_GlobalEnv>
terms(fn_fit$model) %>% attr('.Environment')
# <environment: R_GlobalEnv>
getCall(fn_fit)
# lm(formula = fn_formula, data = fn_data)

score 3 · Answer 1 · answered Jun 26 '17 at 22:15

The variable that stores the data should be in the same scope for both the lm() and update() with the same name. Not sure what you are really trying to accomplish, bit if you want a function that creates a signature you can use in the global environment, you can do something like this would work

fn <- function(fn_formula, fn_data) {
  do.call("lm", list(fn_formula, data=substitute(fn_data)))
}
fn_fit <- fn(formula, data)
update(fn_fit, ~ x2)

Otherwise if you really wanted to capture that variable in the local function scope, you can create a helper to fun update in the correct environment.

fn <- function(fn_formula, fn_data) {
  environment(fn_formula) <- environment()
  lm(formula = fn_formula, data = fn_data)
}

fn_update <- function(object, ...) {
  mc<-match.call(definition = update)
  mc[[1]] <- quote(update)
  eval(mc, envir=environment(terms(object)))
}

fn_fit <- fn(formula, data)
fn_update(fn_fit, ~x2)

I'm aware that it's possible to capture the relevant variables with non-standard evaluation and manipulating calls, but I find it unintuitive, as it would require multiple wrappers, assume that the model is not yet fitted using standard call and highly depend on the actual calling stack. What I really do not understand is the point of maintaining in `terms` the whole calling environment for `y ~ x1` call, which gives no warranty of storing the original data, but may include other memory consuming variables created during the same call of a wrapper function - is there any simple explanation? — mjktfw, Jul 03 '17 at 19:52

score 1 · Answer 2 · answered Jun 27 '17 at 04:28

When you passed formula, the only items stored in the ['model'] sublist were those that were needed.

> names(fn_fit$model)
[1] "y"  "x1"

But there's nothing named either 'data' or 'fn_data' in that object. MrFlick second suggestion is more resilient to modifications in the calling tree of frames:

> fn <- function(fn_formula, fn_data) {
+   do.call("lm", list(fn_formula, data=substitute(fn_data)))
+ }
> fn_fit <- fn(formula, data); rm(data)  # mess with the calling environment
> update(fn_fit, ~ x2)
Error in terms.formula(formula, data = data) : 
  'data' argument is of the wrong type

That error occurred because the R interpreter only found the function named data; if instead you deploy the second option you get:

> data <- data.frame(
+     y = rnorm(100),
+     x1 = rnorm(100),
+     x2 = rnorm(100))

> fn <- function(fn_formula, fn_data) {
+   environment(fn_formula) <- environment()
+   lm(formula = fn_formula, data = fn_data)
+ }
> 
> fn_update <- function(object, ...) {
+   mc<-match.call(definition = update)
+   mc[[1]] <- quote(update)
+   eval(mc, envir=environment(terms(object)))
+ }

> 
> fn_fit <- fn(formula, data) ; rm(data)
> fn_update(fn_fit, ~x2)

Call:
lm(formula = y ~ x2, data = fn_data)

Coefficients:
(Intercept)           x2  
    0.01117     -0.13004

How to access the source data when manipulating/updating fitted models

2 Answers2