1

I am processing a large dataset from a questionnaire that contains coded responses in some but not all columns. I would like to replace the coded responses with actual values. The key/dictionary is stored in another database. The complicating factor is that different questions (stored as columns in original dataset) used the same code (typically numeric), but the code has different meanings depending on the column (question).

How can I replace the coded values in the original dataset with different valuse from a corresponding key stored in the dictionary table, but do it by specific column name (also stored in the dictionary table)?

Below is an example of the original dataset and the dictionary table, as well as desired result.

original <- data.frame(
  name = c('Jane','Mary','John', 'Billy'),
  home = c(1,3,4,2),
  car = c('b','b','a','b'),
  shirt = c(3,2,1,1),
  shoes = c('Black','Black','Black','Brown')
  )

keymap <- data.frame(
  column_name=c('home','home','home','home','car','car','shirt','shirt','shirt'),
  value_old=c('1','2','3','4','a','b','1','2','3'),
  value_new=c('Single family','Duplex','Condo','Apartment','Sedan','SUV','White','Red','Blue')
)

result <- data.frame(
  name = c('Jane','Mary','John', 'Billy'),
  home = c('Single family','Condo','Apartment','Duplex'),
  car = c('SUV','SUV','Sedan','SUV'),
  shirt = c('Blue','Red','White','White'),
  shoes = c('Black','Black','Black','Brown')
)

> original
   name home car shirt shoes
1  Jane    1   b     3 Black
2  Mary    3   b     2 Black
3  John    4   a     1 Black
4 Billy    2   b     1 Brown


> keymap
  column_name value_old     value_new
1        home         1 Single family
2        home         2        Duplex
3        home         3         Condo
4        home         4     Apartment
5         car         a         Sedan
6         car         b           SUV
7       shirt         1         White
8       shirt         2           Red
9       shirt         3          Blue


> result
   name          home   car shirt shoes
1  Jane Single family   SUV  Blue Black
2  Mary         Condo   SUV   Red Black
3  John     Apartment Sedan White Black
4 Billy        Duplex   SUV White Brown

I have tried different approaches using dplyr but have not gotten far as I do not have a robust understanding of the mutate/join syntax.

sk454
  • 25
  • 3

2 Answers2

1

We may loop across the unique values from the 'column_name' column of 'keymap' in the original, subset the keymap that matches the column name (cur_column()), select the columns 2 and 3, deframe to a named vector and match with the values of the column for replacement

library(dplyr)
library(tibble)
original %>% 
   mutate(across(all_of(unique(keymap$column_name)), ~ 
   (keymap %>%
            filter(column_name == cur_column()) %>%
            select(-column_name) %>%
            deframe)[as.character(.x)]))

-output

   name          home   car shirt shoes
1  Jane Single family   SUV  Blue Black
2  Mary         Condo   SUV   Red Black
3  John     Apartment Sedan White Black
4 Billy        Duplex   SUV White Brown

Or an approach in base R

lst1 <- split(with(keymap, setNames(value_new, value_old)), keymap$column_name)
original[names(lst1)] <- Map(\(x, y) y[as.character(x)], 
       original[names(lst1)], lst1)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • May I ask, in the dplyr/tibble approach, is there a way to select columns 2 and 3 by name ("value_old", "value_new")? – sk454 Feb 04 '23 at 23:04
  • @sk454 you can convert the earlier version of `[2:3]` to `[c("value_old", "value_new")]` or make everything in tidyverse, (as in the update) – akrun Feb 04 '23 at 23:07
1

Please check below code where we can use the factor to replace the values in one column with data from another dataframe here in this case with keymap

library(tidyverse)

original %>% mutate(home=factor(home, keymap$value_old, keymap$value_new),
             car=factor(car, keymap$value_old, keymap$value_new),
             shirt=factor(shirt, keymap$value_old, keymap$value_new)
             )

Created on 2023-02-04 with reprex v2.0.2

   name          home   car         shirt shoes
1  Jane Single family   SUV         Condo Black
2  Mary         Condo   SUV        Duplex Black
3  John     Apartment Sedan Single family Black
4 Billy        Duplex   SUV Single family Brown
jkatam
  • 2,691
  • 1
  • 4
  • 12
  • 1
    Thanks. Unfortunately this requires specifying/typing out the individual columns, so it doesn't scale well. – sk454 Feb 06 '23 at 17:43