I have a dataset of students, and I want to create a column that categorizes each student based on their and other students' data in their "subject" and "grade_letter" column because I eventually want to create a column that says "math_science_A", "science_B", etc. to know how many students got As, Bs, Cs, etc. based on the unique make ups.
I have a list of all potential combinations that a student can have, but when I try to iterate over R, my case_when only gives me the first category. I then tried to make multiple columns, so I could pivot longer later, but that also seemed to fail. Any help would be appreciated!
#Here's my sample dataset
library(tidyverse)
test <- tibble(student_id = c(1),
subject = c("math"),
grade_num = c(95),
grade = c("a"))
#Here's a sample of what I want (full desired output would be too long to write; see text below code chunks)
sample_of_desired_output <- tibble(student_id = c(1),
subject = c("math"),
grade_num = c(95),
grade = c("a"),
category = c("n_math_a", "n_science_a", "n_english_a", "n_math_science_a", "n_math_english_a"))
#####Here's what I tried
#Getting all combinations--this part works
all_subjects <- c("math", "science", "english")
i = 1
while(i <= length(all_subjects)) {
if(i == 1) {
all_subject_combos <- c()
} else {
some_combos <- combn(all_subjects, i, FUN = function(x) paste(x, collapse = "_"), simplify = TRUE)
all_subject_combos <- append(all_subject_combos, some_combos)
}
i <- i +1
}
all_grades <- c("A", "B", "C", "D", "F")
all_subjects_and_grades <- expand.grid(all_subject_combos, all_grades)
subjects_and_grades_combos <- all_subjects_and_grades %>%
unite("names", c(Var1, Var2)) %>%
mutate(names = tolower(paste0("n_", str_replace_all(names, "\\|", "_")))) %>%
pull(names)
#This part does not work but is what I tried:
i = 1
while(i <= length(subjects_and_grades_combos)) {
does_not_work <- test %>%
mutate("subject_grade_grouping_{{i}}" := case_when(
str_detect(subjects_and_grades_combos[i], subject) ~ subjects_and_grades_combos[i]))
i <- i +1
}
does_not_work
Note that in the final, there will be all potential combinations, so each student will have their data repeating many times. After I have this, I will be using this to calculate the number of students who fall into each category. So, in this example, math_a, math_science_a, math_english_a, and math_science_english_a would all have 1. Note that I do not need help calculating that part, and I am flexible with the order of the columns.