User-Defined Function for lme model fits: error

Question

I am beginning to write a function that builds linear mixed models with nlme. I am encountering an error: Error in eval(expr, envir, enclos) : object 'value' not found, which I believe is due to R not knowing where to find the data frame variables (e.g., value). If this is, in fact, why the error is occurring, how do I tell the function that value and timepoint belong to the variables in Dat in the (reproducible) code below?

require(nlme)
Dat <- data.frame(
    id = sample(10:19),
    Time = sample(c("one", "two"), 10, replace = T),
    Value = sample(1:10)
)
nlme_rct_lmm <- function (data, value, timepoint,
                      ID) {

    #base_level intercept only model
    bl_int_only <- gls(value ~ 1, 
                       data = data, 
                       method = "ML", 
                       na.action="na.omit")        
    #vary intercept across participants
    randomIntercept <- lme(value ~ 1, 
                           data = data, 
                           random = ~1|ID, 
                           method = "ML", 
                           na.action = "na.omit")       
    #add timepoint as a fixed effect
    timeFE <- lme(value ~ timepoint, 
                  data = data, 
                  random = ~1|ID, 
                  method = "ML", 
                  na.action = "na.omit")
}
nlme_rct_lmm(Dat, Value, Time, id)

Ben Bolker · Accepted Answer · 2016-09-30T18:49:37.777

This isn't (as you and I both expected) a problem with evaluation within different frames; rather, it's an issue of consistency between the names of the variables between the formula and the data. R is case-sensitive, so it matters whether you use value or Value, id or ID, etc.. Furthermore, formula interpretation uses non-standard evaluation (NSE), so if you have a variable value equal to the symbol Value, value ~ 1 does not magically get transmuted to Value ~ 1. What I've outlined below works by passing the names of the response, time, and ID variables to the function, because it's the easiest approach. It's a little bit more elegant to the end-user if you use non-standard evaluation, but that's a bit harder to program (and therefore understand, debug, etc.).

Below the easy/boneheaded approach, I also discuss how to implement the NSE approach (scroll all the way down ...)

Note that your example doesn't return anything; with R, that means that all the results will be discarded when it finishes the function. You might want to return the results as a list (or perhaps your real function will do something other stuff with the fitted models, such as a series of model tests, and return those answers as the results ...)

require(nlme)

Dat <- data.frame(
    ID = sample(10:19),
    Time = sample(c("one", "two"), 10, replace = T),
    Value = sample(1:10)
)

nlme_rct_lmm <- function (data, value, timepoint,
                      ID) {

    nullmodel <- reformulate("1",response=value)
    fullmodel <- reformulate(c("1",timepoint),response=value)
    remodel <- reformulate(paste("1",ID,sep="|"))

    #base_level intercept only model
    bl_int_only <- gls(nullmodel,
                       data = data, 
                       method = "ML", 
                       na.action="na.omit")

    #vary intercept across participants
    randomIntercept <- lme(nullmodel,
                           data = data, 
                           random = remodel,
                           method = "ML", 
                           na.action = "na.omit")

    #add timepoint as a fixed effect
    timeFE <- lme(fullmodel,
                  data = data, 
                  random = remodel,
                  method = "ML", 
                  na.action = "na.omit")
}

nlme_rct_lmm(Dat, "Value", "Time", "ID")

If you want something a bit more elegant (but internally obscure) you can substitute the following lines for defining the models. The inner substitute() calls retrieves the symbols that were passed to the function as arguments; the outer substitute() calls insert those symbols into the formula.

nullmodel <- formula(substitute(v~1,list(v=substitute(value))))
fullmodel <- formula(substitute(v~t,list(v=substitute(value),
                                 t=substitute(timepoint))))
remodel <- formula(substitute(~1|i,list(i=substitute(ID))))

Now this would work, without specifying the variables as strings, as you expected: nlme_rct_lmm(Dat, Value, Time, ID)

That occurred to me but I thought that this might be just the beginning of the OP's example (e.g. that they might do some other stuff with the results before returning a summary value); worth mentioning in the answer anyway ... — Ben Bolker, Sep 30 '16 at 18:39
Excellent! Yes, the actual function is returning a series of model tests. I find the NSE approach to be, as mentioned, more end-user friendly. — Santi Allende, Sep 30 '16 at 20:05

User-Defined Function for lme model fits: error

1 Answers1