I'm trying to write a function that creates a list of lm objects from a dataframe, with a different lm for each unique date in my data set. I would then like to be able to pass these lms into predict() with a new dataframe to generate predicted values and confidence intervals.
Data looks like this:
Date ppm area
10/18/2019 0 0
10/18/2019 0 0
10/18/2019 0.1 438.9804
10/18/2019 0.1 447.1784
10/18/2019 0.1 443.7794
10/18/2019 1 3232.2088
10/18/2019 1 3206.6672
10/18/2019 1 3206.232
10/24/2019 0 0
10/24/2019 0 15.98
10/24/2019 0 0
10/24/2019 0 0
10/24/2019 0.1 379.387
10/24/2019 0.1 325.5268
10/24/2019 0.1 325.8126
10/24/2019 0.1 310.5972
10/24/2019 1 3259.366
10/24/2019 1 3218.0836
10/24/2019 1 3192.7076
The first part seems simple - writing a function that creates a different lm for each date:
standard.lm= function(standards,
date_field = "date",
peak_field,
std_field,
peak_field2 = NA){
library(tidyverse)
library(broom)
y = standards %>% nest(-date_field) %>%
mutate(fit = map(data, ~lm(.[[std_field]] ~ .[[peak_field]], data = .)))
return(y) }
Then I can run the command:
test = standard.lm(standard_data, std_field = "std.ppm", peak_field = "area")
This works well as to generate lms for each date, but the problem is that the coefficient is named.[[peak_field]]
instead of "area"
This creates a problem for me, because I would like to pass these lm objects on to predict()
to predict ppm values from area measurements. My column in the next data table would be named area
and I can't rename it to .[[peak_field]]
. I try something like this and I get an error:
a = c(1300.1, 1400.3, 1500.9)
df = data.frame(area = a)
df$std.ppm = predict(test$fit[[1]], newdata = df)
Error in
$<-.data.frame
(*tmp*
, std.ppm, value = c(1
= -0.00299110569401364, : replacement has 8 rows, data has 3 In addition: Warning message: 'newdata' had 3 rows but variables found have 8 rows
This is happening because predict()
is looking for a column named .[[peak_field]]
instead of recognizing area, and is predicting values for the original input lm data instead of the data I want it to predict.
So basically I'm looking for a solution to overcome this issue. The best solution would allow me to specify coefficient names when I initially create the lm objects in the first function, but I would be ok with something that allows me to specify which column to use in predict()