3

Having an lm object I need to create a function based on its variables represented as character vector. I have tried to use a combination of eval and expr to create an f function that would be further used in obj and nlm optimisation of the latter.

library(tidyverse)
df <- drop_na(airquality)
model <- lm(Ozone~. - Temp, data = df, x=TRUE, y=TRUE)
base_vars <- all.vars(formula(model)[-2])
k <- length(base_vars)

f <- function(base_df, x, y, parms) {
  with(base_df, parms[1] + 
         eval(expr(paste(paste(paste0('parms[', 2:(k+1), ']'), base_vars, sep = '*'), collapse = '+'))) + 
         log(parms[k+2] * (x - parms[k+3] ^ 2)))
}
obj <- function(parms, y, x) mean((residuals(model) - f(df, x, y, parms))^2) 
fit <- with(data, nlm(obj, c(0, 0, 0, 0, 0, 0, 0), y = e, x = x))

But calling f(model$x, df$Temp, model$y, c(0, 0, 0, 0, 0, 0, 0)) results in the following error:

Error in eval(substitute(expr), data, enclos = parent.frame()) : 
  numeric 'envir' arg not of length one 
4.
eval(substitute(expr), data, enclos = parent.frame()) 
3.
with.default(base_df, parms[1] + eval(expr(paste(paste(paste0("parms[", 
    2:(k + 1), "]"), base_vars, sep = "*"), collapse = "+"))) + 
    log(parms[k + 2] * (x - parms[k + 3]^2))) 
2.
with(base_df, parms[1] + eval(expr(paste(paste(paste0("parms[", 
    2:(k + 1), "]"), base_vars, sep = "*"), collapse = "+"))) + 
    log(parms[k + 2] * (x - parms[k + 3]^2))) 
1.
f(model$x, df$Temp, model$y, c(0, 0, 0, 0, 0, 0, 0))

I believe there might be a conflict between eval environment and environment implied by with function, but can't figure out why. Any ideas how can I create custom function f for variable models?

Expected output for the f(model$x, df$Temp, model$y, c(0, 0, 0, 0, 0, 0, 0)) would be:

with(base_df, parms[1]+parms[2]*Solar.R+parms[3]*Wind+parms[4]*Temp+parms[5]*Month+
              parms[6]*Day+log(parms[7] * (Temp - parms[8] ^ 2)))

but for a different model it could be something like:

with(base_df, 
     parms[1]+parms[2]*var1+parms[3]*var2+log(parms[4]*(var3-parms[5]^2)))

so the number of variables and parameters is different with every call.

Xaume
  • 293
  • 2
  • 16
  • Can you describe in words what exactly you want this `f` function to do? There may be a much better way than bothering with `eval` – MrFlick Feb 20 '20 at 15:59
  • 1
    I have absolutely no idea what your function is supposed to return there. A formula object? How should the formula look like? – Roland Feb 20 '20 at 16:00
  • The f function should create a function that has a linear part (using `model` variables) and non-linear part `log(A * (x-B))`. That function is needed to facilitate `obj` function that is supposed to be used in `nlm` optimisation. (I have edited the code accordingly) – Xaume Feb 20 '20 at 16:09
  • 2
    I still don't get it. Can you write down the formula for the linear part? I suspect I can provide an easy solution but your convoluted code attempt does not provide a clear requirement. – Roland Feb 20 '20 at 16:14
  • I have updated with expected output. Please see the post above. – Xaume Feb 21 '20 at 05:57

2 Answers2

5

R supports computing on the language, but it should not be your first option. If you do it, it should never involve text processing of code. You don't have a case here where you need to compute on the language. I have no idea how you thought your attempt would work but I don't know the expr function and I refuse to install package tidyverse and its ginormous dependency tree.

Also, you generally should avoid with outside of interactive use. But with is not the problem here.

Here is how I would do this:

df <- airquality[complete.cases(airquality),]
model <- lm(Ozone~. - Temp, data = df)

f <- function(base_df, x, parms) {

  m <- model.matrix(model, data = base_df)
  k <- ncol(m)
  stopifnot(length(parms) == (k + 2L))
  #I use exp(parms[k+1]) to ensure a positive value within the log
  m %*% parms[seq_len(k)] + log(exp(parms[k + 1L]) * (x - parms[k + 2L] ^ 2))

}

obj <- function(parms, y, x, base_df) mean((residuals(model) - f(base_df, x, parms))^2) 

#some x:
x <- rpois(nrow(df), 10)

fit <- nlm(obj, c(0, 0, 0, 0, 0, 0, 0), x = x, base_df = df)
#works

You don't seem to use y and thus I removed it from the code.

Note how I create the design matrix for the linear part (using model.matrix) and use matrix multiplication with the parameters. You also need to ensure that log doesn't return Inf/-Inf/NaN.

Roland
  • 127,288
  • 10
  • 191
  • 288
1

I think @Roland gave a good answer covering your actual problem. I am isolating what I think you were specifically asking based on the question Title, with no comment on whether it is a good idea or not. It probably isn't in this use case.

But what you were looking for more than likely is eval_tidy() from rlang. I left the :: function notation in just so its obvious what package is being used here.

Note I fixed a couple things that seemed to be errors in the code. I am also using all ones instead of zeros to test in parms due to the log.

library(rlang)
library(tidyr)

# dropped y since it was an unused argument
f <- function(base_df, x, parms) {
  # set an expression to evaluate using parse_expr()
  .f <- rlang::parse_expr(paste(paste(paste0('parms[', 2:(k+1), ']'),
                                      base_vars, sep = '*'), collapse = '+'))

  # use eval_tidy() with the data mask  
  y_part1 <- rlang::eval_tidy(.f, data = base_df)
  y_part2 <- log(parms[k + 2] * (x - parms[k + 3] ^ 2))

  parms[1] + y_part1 + y_part2
}

# using your code
df <- tidyr::drop_na(airquality)
model <- lm(Ozone~. - Temp, data = df, x=TRUE, y=TRUE)
base_vars <- all.vars(formula(model)[-2])
k <- length(base_vars)

# changed to all ones, I think this is what you wanted for length
parms <- rep(1, k + 3)

method_1 <- f(df, df$Temp, parms)

method_2 <- with(df, parms[1]+parms[2]*Solar.R+parms[3]*Wind+parms[4]*Temp+parms[5]*Month+
                   parms[6]*Day+log(parms[7] * (Temp - parms[8] ^ 2)))


all.equal(method_1, method_2)
# [1] TRUE