0

I'm trying to write a function that creates a list of lm objects from a dataframe, with a different lm for each unique date in my data set. I would then like to be able to pass these lms into predict() with a new dataframe to generate predicted values and confidence intervals.

Data looks like this:

Date        ppm area
10/18/2019  0   0
10/18/2019  0   0
10/18/2019  0.1 438.9804
10/18/2019  0.1 447.1784
10/18/2019  0.1 443.7794
10/18/2019  1   3232.2088
10/18/2019  1   3206.6672
10/18/2019  1   3206.232
10/24/2019  0   0
10/24/2019  0   15.98
10/24/2019  0   0
10/24/2019  0   0
10/24/2019  0.1 379.387
10/24/2019  0.1 325.5268
10/24/2019  0.1 325.8126
10/24/2019  0.1 310.5972
10/24/2019  1   3259.366
10/24/2019  1   3218.0836
10/24/2019  1   3192.7076

The first part seems simple - writing a function that creates a different lm for each date:

standard.lm= function(standards,
                          date_field = "date",
                          peak_field,
                          std_field,
                          peak_field2 = NA){
  library(tidyverse)
  library(broom)


  y = standards %>% nest(-date_field) %>%
    mutate(fit = map(data, ~lm(.[[std_field]] ~ .[[peak_field]], data = .)))

    return(y)  }

Then I can run the command:

test = standard.lm(standard_data, std_field = "std.ppm", peak_field = "area")

This works well as to generate lms for each date, but the problem is that the coefficient is named.[[peak_field]] instead of "area"

This creates a problem for me, because I would like to pass these lm objects on to predict() to predict ppm values from area measurements. My column in the next data table would be named area and I can't rename it to .[[peak_field]]. I try something like this and I get an error:

a = c(1300.1, 1400.3, 1500.9)
df = data.frame(area = a)
df$std.ppm = predict(test$fit[[1]], newdata = df)

Error in $<-.data.frame(*tmp*, std.ppm, value = c(1 = -0.00299110569401364, : replacement has 8 rows, data has 3 In addition: Warning message: 'newdata' had 3 rows but variables found have 8 rows

This is happening because predict() is looking for a column named .[[peak_field]] instead of recognizing area, and is predicting values for the original input lm data instead of the data I want it to predict.

So basically I'm looking for a solution to overcome this issue. The best solution would allow me to specify coefficient names when I initially create the lm objects in the first function, but I would be ok with something that allows me to specify which column to use in predict()

  • 1
    Use computing on the language. I'd use bquote and eval but I'm sure there is an equivalent in the tidyverse for the cool kids. – Roland Feb 22 '20 at 07:33
  • This might be a [starting point to catch up](https://dplyr.tidyverse.org/articles/programming.html#different-expressions) with the cool kids ;) – dario Feb 22 '20 at 08:13
  • 1
    https://stackoverflow.com/q/53635127/1412059 – Roland Feb 25 '20 at 07:01

1 Answers1

1

You can try to create a formula in the function using your defined y and x variable:

standard.lm= function(standards,date_field = "Date",
                      peak_field,std_field,peak_field2 = NA){
  lm_form = as.formula(paste(std_field,"~",peak_field))
  #another away
  #lm_form = substitute(y~x,list(y=as.name(std_field),x=as.name(peak_field)))
  y = standards %>% nest(data=-one_of(date_field)) %>%
    mutate(fit = map(data, ~lm(lm_form, data = .)))

    return(y)  }

We test it:

standard_data = structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("10/18/2019", 
"10/24/2019"), class = "factor"), ppm = c(0, 0, 0.1, 0.1, 0.1, 
1, 1, 1, 0, 0, 0, 0, 0.1, 0.1, 0.1, 0.1, 1, 1, 1), area = c(0, 
0, 438.9804, 447.1784, 443.7794, 3232.2088, 3206.6672, 3206.232, 
0, 15.98, 0, 0, 379.387, 325.5268, 325.8126, 310.5972, 3259.366, 
3218.0836, 3192.7076)), class = "data.frame", row.names = c(NA, 
-19L))

mdl = standard.lm(standard_data, std_field = "ppm", peak_field = "area")

predict(mdl$fit[[1]], data.frame(area=c(1300.1,1400.3)))
        1         2 
0.3897161 0.4215205 
StupidWolf
  • 45,075
  • 17
  • 40
  • 72