1

This a a follow up question from Error in calling `lm` in a `lapply` with `weights` argument but it may not be the same problem (but still related).

Here is a reproducible example:

dd <- data.frame(y = rnorm(100),
                 x1 = rnorm(100),
                 x2 = rnorm(100),
                 x3 = rnorm(100),
                 x4 = rnorm(100),
                 wg = runif(100,1,100))

ls.form <- list(
  formula(y~x1+x2),
  formula(y~x3+x4),
  formula(y~x1|x2|x3),
  formula(y~x1+x2+x3+x4)
)

I have a function that takes different arguments (1- a subsample, 2- a colname for the weights argument, 3- a list of formulas to try and 4- the data.frame to use)

f1 <- function(samp, dat, forms, wgt){
  baselm <- lm(y~x1, data = dat[samp,], weights = dat[samp,wgt])
  lapply(forms, update, object = baselm)
}

If I call the function, I get an error:

f1(1:66, dat = dd, forms = ls.form, wgt = "wg")
 Error in is.data.frame(data) : object 'dat' not found 

I don't really get why it doesn't find the dat object, it should be part of the fonction environment. The problem is in the update part of the code as if you remove this line from the function, the code works.

At the end, this function will be call with a lapply

lapply(list(1:66, 33:99), f1, dat=dd, forms = ls.form, wgt="wg")
Bastien
  • 3,007
  • 20
  • 38

3 Answers3

4

I think your problems are due to the scoping rules used by lm which are quite frankly a pain in the r-squared.

One option is to use do.call to get it to work, but you get some ugly output when it deparses the inputs to give the call used for the standard print method.

f1 <- function(samp, dat, forms, wgt){
  baselm <- do.call(lm,list(formula=y~x1, data = dat[samp,], weights = dat[samp,wgt]))
  lapply(forms, update, object = baselm)
}

A better way is to use an eval(substitute(...)) construct which gives the output you originally expected:

f2 <- function(samp, dat, forms, wgt){
  baselm <- eval(substitute(lm(y~x1, data = dat[samp,], weights = dat[samp,wgt])))
  lapply(forms, update, object = baselm)
}
James
  • 65,548
  • 14
  • 155
  • 193
  • Both your options worked with my main code, thanks! I have a feeling it's slower than my original code (without the weigth argument) but haven't benchmark it as it's still really usable. The `do.call` was the fastest. Anyway, it's a very obscure fix for a very obscure bug... Both my bugs today were weird and will require me to study more to understand really what happened! – Bastien Dec 20 '17 at 16:37
  • 1
    Section 8.1.70 in the 8th Circle of The R Inferno touches on these issues: http://www.burns-stat.com/pages/Tutor/R_inferno.pdf – James Dec 20 '17 at 17:13
1

Such scoping issues are very common with lm objects. You can solve this by specifying the correct environment for evaluation:

f1 <- function(samp, dat, forms, wgt){
  baselm <- lm(y~x1, data = dat[samp,], weights = dat[samp,wgt])
  mods <- lapply(forms, update, object = baselm, evaluate = FALSE)
  e <- environment()
  lapply(mods, eval, envir = e)
}

f1(1:66, dat = dd, forms = ls.form, wgt = "wg")
#works
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Your example works every time, but when I apply it to my actual problem, I still get the `object 'dd.temp.ref' not found`, I manage to have it work by loading the different arguments in the main environment, but as soon as I restart R, it doesn't work anymore... I'll keep digging and get back if I found a way to reproduce it. – Bastien Dec 20 '17 at 16:03
  • I don't know what is wrong, but my call `f2 <- function(ref.id, dat, ff, wgt){ dd.temp.ref <- dat[ref.id,] baselm <- lm(ff[[length(ff)]], data = dd.temp.ref, weights = dd.temp.ref[,wgt]) mods <- lapply(ff, update, object = baselm, evaluate = FALSE) e <- environment() all.mod <- lapply(mods, eval, envir = e) } ` is practically identical to your's but fail??? is it my computer?? – Bastien Dec 20 '17 at 16:17
  • 1
    No, it's not the same. You are passing a formula to lm programmatically which most likely creates an additional scoping issue. You should compute on the language to insert the formula into the lm expression and then evaluate that expression in the function environment. I have an answer showing how to do that but searching for it is already difficult when I don't have to do it with my phone. – Roland Dec 20 '17 at 17:03
  • You're right, the problem is in my extraction of a formula in the list... I would have not expected that, considering that `identical(formula(y~x1+x2), ls.form[[1]])` is `TRUE`... So many weird bugs today... @James solution's work, so there is no rush, however I'm interested in looking at the solution you are talking about. Thanks for posting it when you'll get a chance. – Bastien Dec 20 '17 at 17:14
0

The accepted error work, but I continued digging and found this old r-help question (here) which gave more options and explanation. I thought I would post it here in case somebody else needs it.

Bastien
  • 3,007
  • 20
  • 38