How can I create a column of formulas (such as y ~ x
or y ~ log(x)
or ...) from a nested dataframe of models?
In attempt below, the model column contains the model with the largest value of R squared. The purpose of creating a column of model formulas is to identify which model was used in each row.
library(tidyverse)
library(broom)
df <- gapminder::gapminder %>%
select(country, x = year, y = lifeExp) %>%
group_by(country) %>%
nest()
rsq_f <- function(model){summary(model)$r.squared}
best_model <- function(df){
models <- list(
lm(formula = y ~ x, data = df),
lm(formula = y ~ log(x), data = df),
lm(formula = log(y) ~ x, data = df),
lm(formula = log(y) ~ log(x), data = df)
)
R_squared <- map_dbl(models, rsq_f)
best_model_num <- which.max(R_squared)
models[best_model_num][[1]]
}
models <- df %>%
mutate(
model = map(data, best_model),
rsq = map(model, broom::glance) %>% map_dbl("r.squared"),
fun_call = map(model, formula)
)
The output is
> models
# A tibble: 142 x 5
country data model rsq fun_call
<fct> <list> <list> <dbl> <list>
1 Afghanistan <tibble [12 x 2]> <S3: lm> 0.949 <S3: formula>
2 Albania <tibble [12 x 2]> <S3: lm> 0.912 <S3: formula>
3 Algeria <tibble [12 x 2]> <S3: lm> 0.986 <S3: formula>
4 Angola <tibble [12 x 2]> <S3: lm> 0.890 <S3: formula>
5 Argentina <tibble [12 x 2]> <S3: lm> 0.996 <S3: formula>
6 Australia <tibble [12 x 2]> <S3: lm> 0.983 <S3: formula>
7 Austria <tibble [12 x 2]> <S3: lm> 0.994 <S3: formula>
8 Bahrain <tibble [12 x 2]> <S3: lm> 0.968 <S3: formula>
9 Bangladesh <tibble [12 x 2]> <S3: lm> 0.997 <S3: formula>
10 Belgium <tibble [12 x 2]> <S3: lm> 0.995 <S3: formula>
# ... with 132 more rows
Instead of <S3: formula>
I'd like to actually see the formula used by the model.