I have a dataframe with several variables (23 in my example) with binary yes/no conditions, and I am trying to identify combinations of pairs of variables
df <- tibble(V1 = sample(c(0,1), 25, replace=TRUE, prob=c(0.6, 0.4)),
V2 = sample(c(0,1), 25, replace=TRUE, prob=c(0.6, 0.4)),
V3 = sample(c(0,1), 25, replace=TRUE, prob=c(0.8, 0.2)),
V4 = sample(c(0,1), 25, replace=TRUE, prob=c(0.7, 0.3)),
V5 = sample(c(0,1), 25, replace=TRUE, prob=c(0.8, 0.2)),
V6 = sample(c(0,1), 25, replace=TRUE, prob=c(0.8, 0.2)),
V7 = sample(c(0,1), 25, replace=TRUE, prob=c(0.8, 0.2)))
If I wanted to identify every unique group in my dataframe I would use cur_group_id() like this:
df %>% group_by(across(everything())) %>%
mutate(combo_id = cur_group_id())
But what I actually want is to identify combination of pairs of yes conditions. For example, I want to identify cases where V1 == 1 & V2 == 1, ignoring what any of the other columns contain.
So basically I want to do this:
df %>%
mutate(combo_id = case_when(V1 == 1 & V2 == 1 ~ "V1_V2"))
but I want to be able to apply this across every possible 2 variable combination for all of the variables in my dataframe.
Maybe this is a job for map()? I'm stuck.