How to change generic lm() coefficient names in a list of lm objects for use in predict()?

Question

I'm trying to write a function that creates a list of lm objects from a dataframe, with a different lm for each unique date in my data set. I would then like to be able to pass these lms into predict() with a new dataframe to generate predicted values and confidence intervals.

Data looks like this:

Date        ppm area
10/18/2019  0   0
10/18/2019  0   0
10/18/2019  0.1 438.9804
10/18/2019  0.1 447.1784
10/18/2019  0.1 443.7794
10/18/2019  1   3232.2088
10/18/2019  1   3206.6672
10/18/2019  1   3206.232
10/24/2019  0   0
10/24/2019  0   15.98
10/24/2019  0   0
10/24/2019  0   0
10/24/2019  0.1 379.387
10/24/2019  0.1 325.5268
10/24/2019  0.1 325.8126
10/24/2019  0.1 310.5972
10/24/2019  1   3259.366
10/24/2019  1   3218.0836
10/24/2019  1   3192.7076

The first part seems simple - writing a function that creates a different lm for each date:

standard.lm= function(standards,
                          date_field = "date",
                          peak_field,
                          std_field,
                          peak_field2 = NA){
  library(tidyverse)
  library(broom)


  y = standards %>% nest(-date_field) %>%
    mutate(fit = map(data, ~lm(.[[std_field]] ~ .[[peak_field]], data = .)))

    return(y)  }

Then I can run the command:

test = standard.lm(standard_data, std_field = "std.ppm", peak_field = "area")

This works well as to generate lms for each date, but the problem is that the coefficient is named.[[peak_field]] instead of "area"

This creates a problem for me, because I would like to pass these lm objects on to predict() to predict ppm values from area measurements. My column in the next data table would be named area and I can't rename it to .[[peak_field]]. I try something like this and I get an error:

a = c(1300.1, 1400.3, 1500.9)
df = data.frame(area = a)
df$std.ppm = predict(test$fit[[1]], newdata = df)

Error in $<-.data.frame(*tmp*, std.ppm, value = c(1 = -0.00299110569401364, : replacement has 8 rows, data has 3 In addition: Warning message: 'newdata' had 3 rows but variables found have 8 rows

This is happening because predict() is looking for a column named .[[peak_field]] instead of recognizing area, and is predicting values for the original input lm data instead of the data I want it to predict.

So basically I'm looking for a solution to overcome this issue. The best solution would allow me to specify coefficient names when I initially create the lm objects in the first function, but I would be ok with something that allows me to specify which column to use in predict()

Use computing on the language. I'd use bquote and eval but I'm sure there is an equivalent in the tidyverse for the cool kids. — Roland, Feb 22 '20 at 07:33
This might be a [starting point to catch up](https://dplyr.tidyverse.org/articles/programming.html#different-expressions) with the cool kids ;) — dario, Feb 22 '20 at 08:13

score 1 · Answer 1 · answered Feb 22 '20 at 09:08

You can try to create a formula in the function using your defined y and x variable:

standard.lm= function(standards,date_field = "Date",
                      peak_field,std_field,peak_field2 = NA){
  lm_form = as.formula(paste(std_field,"~",peak_field))
  #another away
  #lm_form = substitute(y~x,list(y=as.name(std_field),x=as.name(peak_field)))
  y = standards %>% nest(data=-one_of(date_field)) %>%
    mutate(fit = map(data, ~lm(lm_form, data = .)))

    return(y)  }

We test it:

standard_data = structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("10/18/2019", 
"10/24/2019"), class = "factor"), ppm = c(0, 0, 0.1, 0.1, 0.1, 
1, 1, 1, 0, 0, 0, 0, 0.1, 0.1, 0.1, 0.1, 1, 1, 1), area = c(0, 
0, 438.9804, 447.1784, 443.7794, 3232.2088, 3206.6672, 3206.232, 
0, 15.98, 0, 0, 379.387, 325.5268, 325.8126, 310.5972, 3259.366, 
3218.0836, 3192.7076)), class = "data.frame", row.names = c(NA, 
-19L))

mdl = standard.lm(standard_data, std_field = "ppm", peak_field = "area")

predict(mdl$fit[[1]], data.frame(area=c(1300.1,1400.3)))
        1         2 
0.3897161 0.4215205

How to change generic lm() coefficient names in a list of lm objects for use in predict()?

1 Answers1