Call variables from a data frame as in `lm`

Question

I want to write a function that will generate an new variable based on relations specified by the user. For example, given the data frame:

d=structure(list(x1 = c(1.51402536388423, 2.46080908251235, 0.0820537335444602, 
0.397916902799275, 1.95703984456426, 0.339037316676135, -0.0983477082382985, 
-0.811438758653617, -0.22166264965645, -1.24251846727355), x2 = c(1.31813185688133, 
1.72398579121766, -0.193614904270392, 0.432834246728345, 1.59997674335209, 
0.600172345889666, -0.215380204258891, -0.561283409895365, 0.042565271836392, 
-1.19165094830462), x3 = c(0.811032464442614, 0.775382517472752, 
-0.513659338850135, 1.88476174946952, -0.609641201640788, -1.64673649834054, 
-2.0395881504007, -0.0752358173117906, -1.23648041024926, 2.4485419578765
)), .Names = c("x1", "x2", "x3"), row.names = c(NA, -10L), class = "data.frame")

The user may specify something like y~.5*x1+.2*x2+.4*x3 to create a new variable y. This is trivially easy to do for one variable but I don't know how to generalize this. Thus,

How do I write a function that identifies the variables selected and creates a new variable based on these weights?

I think the function would contain 2 arguments (NewVariable=function(model,data)) but I'm not sure what to do next.

Note that this question is similar to the question: extract variables in formula from a data frame, except the user would specify "regression weights".

Would it be ok to have `y <- someFunction(".5*x1+.2*x2+.4*x3", d)` instead of the formula specification. — B.Shankar, May 23 '15 at 22:55
I think so... I didn't say this but it'd also need to accomodate interactions (e.g., `...+.5*x1*x2`). That is why I was thinking a model statement might be most intuitive. — User7598, May 23 '15 at 23:01

B.Shankar · Accepted Answer · 2015-05-23T23:52:47.000

0

This is a possible solution:

modelFunction <- function(formula, data) {
  apply(data, 1, function(rw) {
    .e = environment()
    lapply(names(rw), function(varName) assign(x = varName, value = rw[varName], pos = .e))
    eval(formula)
  })
}

Call it like this:

y <- modelFunction(.5*x1+.2*x2+.4*x3, d)  # Note that the formula is unquoted

This will work with interaction terms as well.

EDIT :

A really concise solution (suggested by @MrFlick), using the envir parameter of eval function:

y <- eval(quote(.5*x1+.2*x2+.4*x3), d)

edited May 23 '15 at 23:52

answered May 23 '15 at 23:16

B.Shankar

1,271
7
11

2

You can just `eval()` using `data` as the `envir=` parameter. Something like: `dd<-data.frame(x1=1, x2=2, x3=3); eval(quote(.5*x1+.2*x2+.4*x3), dd)`. – MrFlick May 23 '15 at 23:33
Actually @MrFlick, when I try to create a function for this `newData=function(formula,data) {eval(quote(formula),data)}`. The data doesn't attach (e.g., `newData(.5*x1,d)` produces an error but newData(.5*d$x1) does not). What is wrong? – User7598 May 24 '15 at 00:06
You need `newData=function(formula,data) {eval(substitute(formula),data)}` to prevent premature evaluation of the function parameter. – MrFlick May 24 '15 at 00:08

Call variables from a data frame as in `lm`

1 Answers1

EDIT :