1

I have a survey dataset that contains 100+ variables and almost all have 1-10 code values. The code values per column is provided in another df.

Example data:


survey_df = structure(list(resp_id = 1:5, gender = c("1", "2", "2", "1", 
"1"), state = c("1", "2", "3", "1", "4"), education = c("1", 
"1", "1", "2", "2")), class = "data.frame", row.names = c(NA, 
-5L))

coded_df = structure(list(col = c("state", "gender", "education"), col_values = c("1-CA,2-TX,3-AZ,4-CO", 
"1-Male,2-Female", "1-High School,2-Bachelor")), class = "data.frame", row.names = c(NA, 
-3L))

Since the survey columns change over time/product I wanted to avoid any hard coded recodes and hence have a function that will take input a column name and return a "named vector" from coded_df.

get_named_vec <- function(x) {
  tmp_chr <- coded_df %>%
    filter(col == x) %>%
    mutate(col_values = str_replace_all(col_values, "\\n", "")) %>%
    separate_rows(col_values, sep = ",") %>%
    separate(col_values, into = c("var1", "var2"), sep = "-") %>%
    mutate(var1 = as.character(as.numeric(var1)), 
           var2 = str_trim(var2)) %>%
    pull(var2, var1)

  return(tmp_chr)
  
}

I then use the named vector as follows to update the survey_df.

survey_df%>%
 mutate(gender = recode(gender,!!!get_named_vec("gender"),.default = "NA_character_"))

So far this work on per column basis which means 100+ executions!

But how do I make this run via mutate_at so that I selectively recode certain variables in single execution.

# This does not work.
to_update_col<-c("state","gender")
survey_df%>%
  mutate_at(.vars=all_of(to_update_col),.funs=function(x) recode(x,!!!get_named_vec(x))))

Any help much appreciated!

Thanks

Vinay

Vinay
  • 243
  • 2
  • 5
  • `get_named_vector()` does not work. For starters, the column reference to `col_value` should be `col_values` in `coded_df`. In `coded_df` the replacement values are on the LHS of the equals sign for `state` (assuming since they're coded as "CO", "CA", etc. in `survey_df`), but for the other variables, the replacement values are on the RHS of the equals sign. Was that your intention? Are you trying to recode survey values from numeric to string representation? – LMc Apr 13 '21 at 17:17
  • typos during creation of dummy df + function. Fixed with updates. – Vinay Apr 14 '21 at 04:15

1 Answers1

1

I expect it will be simpler and more performant to convert this into a pivot-join-pivot operation, where you make the source and lookup tables into long format, join them, and reshape wide again.

Given this survey info:

survey_df = structure(list(resp_id = 1:5, 
                           gender = c(1L, 2L, 2L, 1L, 1L), 
                           state = c(1, 2, 3, 1, 4), 
                           education = c(1L, 1L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -5L)) %>%
 mutate(across(-resp_id, as.character))

We can convert the lookup table into long form:

coded_df_long <- coded_df %>%
  separate_rows(col_values, sep = ",") %>%
  separate(col_values, c("old", "new"), extra = "merge")

Then pivot the survey long, join to the codings, and pivot wide again.

survey_df %>%
  pivot_longer(-resp_id) %>%
  left_join(coded_df_long, by = c("name" = "col", "value" = "old")) %>%
  select(-value) %>%
  pivot_wider(names_from = name, values_from = new)

Result

# A tibble: 5 x 4
  resp_id gender state education  
    <int> <chr>  <chr> <chr>      
1       1 Male   CA    High School
2       2 Female TX    High School
3       3 Female AZ    High School
4       4 Male   CA    Bachelor   
5       5 Male   CO    Bachelor   
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • This is excellent!.. This works on the dummy data frames in the question. I am trying this with my actual survey and will revert. Funny that I was fixated on "recode" that I missed on the joins!!! – Vinay Apr 14 '21 at 04:17
  • Keeping this here as well in case someone navigates here for similar problem. While the pivots/seperate helped me, the following can also be used directed as I had intended in the first place via function. survey_df <- survey_df %>% mutate_at(vars(to_update_col), function(x) { var_name <- rlang::as_label(substitute(x)) ### This was failing for me in my original attempt. recode(x,!!!get_named_vec(var_name), .default = NA_character_) }) – Vinay Apr 19 '21 at 13:57