I have a survey dataset that contains 100+ variables and almost all have 1-10 code values. The code values per column is provided in another df.
Example data:
survey_df = structure(list(resp_id = 1:5, gender = c("1", "2", "2", "1",
"1"), state = c("1", "2", "3", "1", "4"), education = c("1",
"1", "1", "2", "2")), class = "data.frame", row.names = c(NA,
-5L))
coded_df = structure(list(col = c("state", "gender", "education"), col_values = c("1-CA,2-TX,3-AZ,4-CO",
"1-Male,2-Female", "1-High School,2-Bachelor")), class = "data.frame", row.names = c(NA,
-3L))
Since the survey columns change over time/product I wanted to avoid any hard coded recodes and hence have a function that will take input a column name and return a "named vector" from coded_df.
get_named_vec <- function(x) {
tmp_chr <- coded_df %>%
filter(col == x) %>%
mutate(col_values = str_replace_all(col_values, "\\n", "")) %>%
separate_rows(col_values, sep = ",") %>%
separate(col_values, into = c("var1", "var2"), sep = "-") %>%
mutate(var1 = as.character(as.numeric(var1)),
var2 = str_trim(var2)) %>%
pull(var2, var1)
return(tmp_chr)
}
I then use the named vector as follows to update the survey_df.
survey_df%>%
mutate(gender = recode(gender,!!!get_named_vec("gender"),.default = "NA_character_"))
So far this work on per column basis which means 100+ executions!
But how do I make this run via mutate_at so that I selectively recode certain variables in single execution.
# This does not work.
to_update_col<-c("state","gender")
survey_df%>%
mutate_at(.vars=all_of(to_update_col),.funs=function(x) recode(x,!!!get_named_vec(x))))
Any help much appreciated!
Thanks
Vinay