How to use transformations to variables in formulas in R

Question

I'm trying to use transformations of my outcomevar in a function that runs a few variants of models and stores the result in a list.

The runpanelsfunction first calls the prepare data function, which creates the lagged and differenced variables of the outcome variable specified as argument in the function. So after preparedata, model data contains outcomevar, doutcomevar and loutcomevar.

My problem is I now need to call/get these transformations of the outcomevar to subset the data such that loutcomevar and doutcomevar is not zero. And then i need to use doutcomevar and loutcomevar in the models.

     set.seed(1)
     df <- data.frame(firm=rep(LETTERS[1:5],each=10),
           date=as.Date("2014-01-01")+1:10,
           y1=sample(1:100,50),y2=sample(1:100,50),y3=sample(1:100,50),
           x1=sample(1:100,50), x2=sample(1:100,50))

     preparedata<-function(testData,outcomevar){
     require(data.table)
     DT <- as.data.table(testData)
     setkey(DT,firm,date)
     DT[,lag  := c(NA,unlist(.SD)[-.N]),  by=firm, .SDcols=outcomevar]
     DT[,diff := c(NA,diff(unlist(.SD))), by=firm, .SDcols=outcomevar]
     setnames(DT,c("lag","diff"),paste0(c("loutcomevar","doutcomevar")))
     return(DT)
     modeldata<-as.data.frame(DT)
     }


     runpanels <- function(testData,outcomevar)  {
     modeldata<-preparedata(testData,outcomevar)  
     modeldata<-subset(modeldata,loutcomevar!=0& doutcomevar!=0)
     modellist<-list()
     modellist$m1<-lm(log(outcomevar)~-1+x1+x2,data=modeldata)  
     modellist$m2<-lm(log(doutcomevar)~-1+x1+date,data=modeldata)
     modellist$m3<-lm(log(outcomevar)~-1+log(loutcomevar)+x1+x2,data=modeldata)
     return(modellist)
     }
     Example use:  modelsID1<-runpanels(df,outcomevar="y1")

Unsurprisingly, I get the error when it gets to evaluating "loutcomevar!=0" : Error in eval(expr, envir, enclos) : object 'loutcomevar' not found Called from: eval(e, x, parent.frame())

So it does not find the lagged variable i created in the prepare data function in the environment of the run panels function.

How can I call those variables?

The below example solution from another question was using call which is similar to my problem but i also want to call transformations of my outcomevar which is an argument of the function. Any ideas how to tackle this will be much appreciated!

Example solution from other question that was kind of similar: air <- data(airquality) fm <- lm(Ozone ~ Solar.R, data=airquality)

 myfun <- function(fm, name){
 dn <- fm$call[['data']]
 varname <- deparse(substitute(name))
 get(as.character(dn),envir=.GlobalEnv)[varname]
 }
 Usage: myfun(fm, Temp)

Where are you defining loutcomevar and doutcomevar? (They were never really the variable character values.) If you run `preparedata` with df and "y1", you get 'dy1' and 'ly1', not the names you are throwing errors with inside `subset`. Read the `subset` help page more carefully. It specifically warns you to expect difficulties when using within functions. — IRTFM, Jul 05 '14 at 18:23
Yes, that's the problem. I don't know how to define them such that this runs. — TinaW, Jul 05 '14 at 18:35
I was offering a possible starting point. It's really two separate questions: one about subsetting with `[[` or `[`, and one about buildng formulas. There are lots of worked examples in SO on building formula objects. — IRTFM, Jul 05 '14 at 18:39
Thank you! You gave me an idea. I updated the definition above in the set names function to keep this generically defined as "loutcome" and d"outcome". Now i get the error when running the first model since it does not find the variable "outcomevar". So i need to create a new var that is = to the outcomevar specified as argument and then it should work i guess. This will not use the original name. Rather replace a placeholder outcomevar with the correct names. — TinaW, Jul 05 '14 at 18:48
preparedata<-function(testData,outcomevar){ require(data.table) DT <- as.data.table(testData) setkey(DT,firm,date) DT[,lag := c(NA,unlist(.SD)[-.N]), by=firm, .SDcols=outcomevar] DT[,diff := c(NA,diff(unlist(.SD))), by=firm, .SDcols=outcomevar] setnames(DT,c("lag","diff"),paste0(c("loutcomevar","doutcomevar"))) DT$outcomevar <- with(DT, eval(parse(text=outcomevar))) return(DT) modeldata<-as.data.frame(DT) } — TinaW, Jul 05 '14 at 19:03
I tried the idea you gave me as above and this works. If i redefine prepare data as above it works. It would be nicer if i could keep the name of the variable but since it is specified, but at least this works flexibly for the 3 outcomevars. Thank you for your suggestions BondedDust! — TinaW, Jul 05 '14 at 19:04

score 1 · Accepted Answer · answered Jul 05 '14 at 18:37

You are assuming way too much capacity of the R interpreter to think like you do. It's powers of abstraction are much more limited. In particular there is no interpretation that would allow doutcomevar nd loutcomevar to be constructed within a formula or in the subset call.

Something allong these (untested) lines might work:

runpanels <- function(testData,outcomevar)  {
     modeldata<-preparedata(testData,outcomevar)  
     idx <-  testData[[ paste0("l", outcomevar) ]] != 0 &
             testData[[ paste0("d", outcomevar) ]] != 0
     modeldata<-modeldata[idx ,]
     modellist<-list()
     form1 <- as.formula( "log(", outcomevar,")~-1+x1+x2" )
     modellist$m1<-lm(log(outcomevar)~-1+x1+x2,data=modeldata)  
        #similar construction of formula objects for models 2 and 3
        # .........
     modellist$m2<-lm(log(doutcomevar)~-1+x1+date,data=modeldata)
     modellist$m3<-lm(log(outcomevar)~-1+log(loutcomevar)+x1+x2,data=modeldata)
     return(modellist)
     }

score 0 · Answer 2 · answered Jul 05 '14 at 23:37

       set.seed(1)
 df <- data.frame(firm=rep(LETTERS[1:5],each=10),
       date=as.Date("2014-01-01")+1:10,
       y1=sample(1:100,50),y2=sample(1:100,50),y3=sample(1:100,50),
       x1=sample(1:100,50), x2=sample(1:100,50))

      preparedata<-function(testData,outcomevar){
      require(data.table)
      DT <- as.data.table(testData)
      setkey(DT,firm,date)
      DT[,lag  := c(NA,unlist(.SD)[-.N]),  by=firm, .SDcols=outcomevar]
      DT[,diff := c(NA,diff(unlist(.SD))), by=firm, .SDcols=outcomevar]
      setnames(DT,c("lag","diff"),paste0(c("loutcomevar","doutcomevar")))
      DT$outcomevar <- with(DT, eval(parse(text=outcomevar))) 
      return(DT)
      modeldata<-as.data.frame(DT)
      }
      runpanels <- function(testData,outcomevar)  {
      modeldata<-preparedata(testData,outcomevar)  
      modeldata<-subset(modeldata,loutcomevar!=0& doutcomevar!=0)
      modellist<-list()
      modellist$m1<-lm(log(outcomevar)~-1+x1+x2,data=modeldata)  
      modellist$m2<-lm(log(doutcomevar)~-1+x1+date,data=modeldata)
      modellist$m3<-lm(log(outcomevar)~-1+log(loutcomevar)+x1+x2,data=modeldata)
      return(modellist)
      }
 Example use:  modelsID1<-runpanels(df,outcomevar="y1")
 Example use:  modelsID1<-runpanels(df,outcomevar="y2")

How to use transformations to variables in formulas in R

2 Answers2