2

I want to recode values in one dataset based on values in another dataset. My overall goal is to apply recode across multiple columns of the dataframe.

Data:

df <- data.frame(
  gender=c(1,2,1,2),
  condition=c(1,1,2,2)
)
df

  gender condition
1      1         1
2      2         1
3      1         2
4      2         2

Other dataset:

codes <- data.frame(
  gender_values= c("`1`='male', `2`='female'"),
  condition_values = c("`1`='exp', `2`='control'")
)

codes

             gender_values         condition_values
1 `1`='male', `2`='female' `1`='exp', `2`='control'

Attempt:

df %>% 
  dplyr::mutate(
  gender= dplyr::recode(gender, cat(noquote(codes[1,"gender_values"])), .default = NA_character_)
)

`1`='male', `2`='female'  gender condition
1   <NA>         1
2   <NA>         1
3   <NA>         2
4   <NA>         2

Wanted:

  gender condition
1   male       exp
2 female       exp
3   male   control
4 female   control
jpsmith
  • 11,023
  • 5
  • 15
  • 36
EML
  • 615
  • 4
  • 14
  • In your `codes`, you have `c("`1`='male', `2`='female'")` which is one long, single string. Was this intentional, or is this supposed to be two elements (`1="male"` and `2="female"`)? Something like, `c("1='male'", "2='female'")`? – jpsmith Jan 18 '23 at 14:17
  • You sure that codes has this weird form? Why not log form (i.e. variable_name, variable_vaule, variable_text? Or wide form? Most straight forward solution would be to use a join to bind the "recoded" values to the numbers... – dario Jan 18 '23 at 14:18
  • 1
    Does this answer your question? [Recoding values in second data frame based on values in a different data frame](https://stackoverflow.com/questions/71819396/recoding-values-in-second-data-frame-based-on-values-in-a-different-data-frame) – dario Jan 18 '23 at 14:19
  • @jpsmith I intended a long string so recode can use the values. For some reason, recode is not using the values in the same way as recode(gender, `1`="male"...). – EML Jan 18 '23 at 14:28
  • @dario I believe it does not because the value labels for my dataset are in one cell for each variable. – EML Jan 18 '23 at 15:31
  • @EML then you'd most likely want to extract them into a more suitable form (I don't see why you'd want to have a string there encoding that information when any list/data.frame would make much more sense. But anyways... Best of luck! – dario Jan 18 '23 at 15:35

1 Answers1

2

If you want to use dplyr::recode, you can exploit the splice operator !!! to help out here and evaluate the values in codes. It helps if you simplify your codes data:

codes <- data.frame(
  gender_values = c("male", "female"),
  condition_values = c("exp", "control")
)

Then, for example, on a single column you can do:

dplyr::recode(df$gender, !!!codes$gender_values)
# [1] "male"   "female" "male"   "female"

One way to apply it across columns given your example data is to use sapply:

sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes[,paste0(x, "_values")]))

#      gender   condition
# [1,] "male"   "exp"    
# [2,] "female" "exp"    
# [3,] "male"   "control"
# [4,] "female" "control"

(Note this specific example assumed all column names in codes for variable “x” (in df) is “x_values”, as in your example data)

Also, if you wanted to keep you codes values exactly as is, you could do:

codes <- data.frame(
  gender_values= c("`1`='male', `2`='female'"),
  condition_values = c("`1`='exp', `2`='control'")
)

# single column example

dplyr::recode(df$gender, !!!strsplit(codes$gender_values, ",")[[1]])
# [1] "`1`='male'"    " `2`='female'" "`1`='male'"    " `2`='female'"

# multiple columns
sapply(names(df), function(x) dplyr::recode(df[,x], !!!strsplit(codes[,paste0(x, "_values")], ",")[[1]]))

#      gender          condition       
# [1,] "`1`='male'"    "`1`='exp'"     
# [2,] " `2`='female'" "`1`='exp'"     
# [3,] "`1`='male'"    " `2`='control'"
# [4,] " `2`='female'" " `2`='control'"

To clean it up a bit, though, you could preprocess the original codes:

codes2 <- lapply(codes[], function(x)
  gsub("`|'|=|[[:digit:]]+", "", trimws(unlist(strsplit(x, ",")))))

sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes2[[paste0(x, "_values")]]))

#      gender   condition
# [1,] "male"   "exp"    
# [2,] "female" "exp"    
# [3,] "male"   "control"
# [4,] "female" "control"

I would also advise changing "exp" to "exposure" or "expos" or something else, as exp is a function in R to calculate exponentials, and its good practice not to confuse things!

jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • I’d probably need the dput - this is best suited for its own question as it may be valuable to others. Ping me if you create a new one - Good luck! – jpsmith Jan 19 '23 at 21:09