I have a dataframe with let's call it dependent variable, various independent variables (indicators) and a filtering variable. My goal is to run regressions by filtering different categories in my filtering variable. For example, if I want to run regression for code == "all"
, I will just take my dataframe, filter the code, and run a regression:
sample_tib %>%
filter(code == "all") %>%
glm(love ~ ., data = ., family = "gaussian")
But there are several problems that I am facing:
- In my example above my
glm()
will take all columns, not excepting thecode
. The desirable input into the regression islove ~ ind1 + ind2 + ... + ind_n
; - Filtering by all codes in
code
and running different models is costly and not really the thing that I want.
Maybe there exist a function which filters the dataframe, then runs a regression and nests its results in a new dataframe or list? I tried to figure this out and came across this question and beautiful Dave Gruenewald's solution. But his way takes only one pattern - x ~ y
, one dependent and one independent variable. Which is obviously not what I need.
So, is there any elegant solutions or specific packages and functions for this problem?
Data:
sample_tib <- data.frame(
code = c(
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer"
),
love = runif(36),
ind1 = runif(36),
ind2 = runif(36),
ind3 = runif(36),
ind4 = runif(36),
ind5 = runif(36),
ind6 = runif(36),
ind7 = runif(36)
)