This seems too complicated but it's what I could come up with. (It would be much more efficient to do this without running the linear model itself as part of the pipeline, i.e. just identifying which samples were used -- this might be do-able with model.frame()
and some appropriate joining ...
library(dplyr)
library(purrr)
library(broom)
library(tibble)
## same as before, but also convert rownames to a column
df <- mtcars %>%
mutate(disp = replace(hp, c(2, 3), NA),
wt = replace(wt, c(3, 4, 5), NA)) %>%
rownames_to_column("model")
## (1) set up vector of vars and give it names (for later .id=)
dd <- c("disp", "wt") %>%
setNames(c("samp1", "samp2")) %>%
## (2) construct formulas for lm
map(reformulate, response = "mpg") %>%
## (3) fit the lm
map(lm, data = df) %>%
## (4) generate fitted values
map_dfr(augment, newdata=df, .id="samp") %>%
select(samp, model, .fitted) %>%
## (5) identify which observations were *not* used
mutate(val = !is.na(.fitted)) %>%
## (6) pivot from one long column to two half-length columns
pivot_wider(names_from=samp, values_from=val, id_cols= model) %>%
## (7) add to df
full_join(df, by = "model")
This version does it without running the models.
## helper function: returns logical vector of whether observation
## was included in model frame or not
drop_vec <- function(mf) {
nn <- attr(mf, "na.action")
incl <- rep(TRUE, nrow(mf) + length(nn))
incl[nn] <- FALSE
incl
}
## first few bits are the same as above
dd <- c("disp", "wt") %>%
setNames(c("samp1", "samp2")) %>%
map(reformulate, response = "mpg") %>%
## only construct model frames - don't run lm()
map(model.frame, data = df) %>%
## apply helper function
map(drop_vec) %>%
## stick them together
bind_cols(df)
The only thing I don't like about this solution is that the samp
columns end up at the beginning; would have to fuss a bit more to get them as the last columns in the data frame.