0

I want to augment the data frame with .fitted columns from broom::augment() for different lm() models so that I can compare the predictions later. I (re)name() the .fitted column for each model, but augment() also adds .rownames .se.fit, etc., columns. So when I run augment() for the second model I get

Error: Column .rownames must have a unique name

for each of the columns from augment() that I am not interested in.

I end up un-selecting each of the columns from augment() that I am not interested in. I have to do this after each call to augment().

{r}
require(tidyverse)
require(broom)
require(ISLR)

lm.obj <- lm(mpg ~ horsepower, data = Auto)
Auto <- Auto %>% 
  augment(x = lm.obj) %>% 
  rename(mpg_pred_linear = .fitted)

qd.obj <- lm(mpg ~ horsepower + I(horsepower^2), Auto)
Auto <- Auto %>% 
  augment(x = qd.obj) %>% 
  rename(mpg_pred_quad = .fitted)

I know I could use predict.lm() instead, but in the course I am teaching I have already used augment() to create the predictions, and I wanted consistency for my students.

Are there options in augment.lm() that I am not finding that allow for customization of column names? Or choice of output columns?

I am not sure if this is a problem I just need to cope with, or if I should submit it as an issue instead. Please advise (or chastise) as you see fit.

C. Peck
  • 3,641
  • 3
  • 19
  • 36
  • Are you sure you want the predictions in wide format rather than long format, which would be easier (and could be spread if necessary)? – Richard Telford Mar 26 '19 at 19:52
  • To clarify: is your general goal is to attach only the fitted values for a linear and quadratic trend to the current dataset (at least, in this example)? And to do so in a way your students will understand :)? – Andrew Mar 26 '19 at 20:00
  • Richard, I am not quite sure what you mean. – Murphy Waggoner Mar 27 '19 at 15:58
  • Andrew, the goal is to have columns for the fitted values for linear, quadratic, and other models for both the training and test sets. The ultimate goal is to calculate (by hand, i.e., using R and formulas) the MSE for each model to illustrate using the MSE for model selection and to illustrate that training error is less than test error. This is part of a lecture to motivate LOOCV, k-fold CV, and bootstrap. I want the students to learn the concepts and code for these three CV methods, and I don't want to muddy the lecture with new code only used for illustration. – Murphy Waggoner Mar 27 '19 at 16:03

1 Answers1

0

One option is to use rename_at(), with the function start_with() in order to identify all the columns you just added with augment() and to rename all of them:

lm.obj <- lm(Petal.Width ~ Sepal.Width, data = iris)
iris <- iris %>% 
  augment(x = lm.obj) %>%
  rename_at(vars(starts_with('.')), funs(paste0('mod1', .)))

colnames(iris)
#  [1] "Sepal.Length"   "Sepal.Width"    "Petal.Length"   "Petal.Width"    "Species"        "mod1.fitted"    "mod1.se.fit"   
#  [8] "mod1.resid"     "mod1.hat"       "mod1.sigma"     "mod1.cooksd"    "mod1.std.resid"

This way you can iterate several times :

qd.obj <- lm(Petal.Width ~ Species + I(Petal.Length^2), iris)
iris <- iris %>% 
  augment(x = qd.obj) %>% 
  rename_at(vars(starts_with('.')), funs(paste0('mod2', .)))
demarsylvain
  • 2,103
  • 2
  • 14
  • 33
  • Of course, rename_at(). It is the same length as the code I had. Thanks. I actually use augment(x = lm.obj, newdata = iris) to reduce the number of cols that augment() returns. With rename, I can keep them all, named by model. Thanks. – Murphy Waggoner Mar 26 '19 at 20:20