Recoding several variables in for loop

Question

I am working with some survey data where some of the responses have been coded inconsistently. For instance' "I don't know" may be coded as 4, 5, 97, or 777. I want to standardize all of these responses as "77" as efficiently as possible. I would like to use a crosswalk and want to avoid creating a new recode command for every variable in which this occurs, if possible, since there are several.

library(tidyverse)

#df with the inconsistent fourth category
var1 <- c("1", "2", "3", "4")
var2 <- c("1", "2", "3", "5")
var3 <- c("1", "2", "3", "97")
var4 <- c("1", "2", "3", "777")

df <- data.frame(var1, var2, var3, var4)

var <- c("var1", "var2", "var3", "var4")
oldvalue <- c("4", "6", "97", "777")
newvalue <- c("77", "77", "77", "77")

#crosswalk of old values to new values
cw <- data.frame(var, oldvalue, newvalue)

recodevars = cw$var

A few things I have tried are as follows, although I haven't had any luck with any. Please let me know if you have any suggestions.

rec_all = df %>%
  transmute_at(vars(recodevars), funs(recode(., cw$oldvalue = cw$newvalue)))


for(i in recodevars){
  rec_all = df %>%
    transmute_at(vars(recodevars), funs(ifelse(i == cw$oldval, cw$newval, i)))
}

score 6 · Answer 1 · answered Jul 19 '21 at 21:24

Loop across the 'recodevars' columns, subset the 'cw' dataset rows based on the column name (cur_column()) while removing the first column, use that to match and replace the values of the data, and coalesce with the original column so that the NA elements for non-matches are replaced with corresponding original column values

library(dplyr)
library(tibble)
df %>% 
    mutate(across(all_of(recodevars), 
       ~ coalesce(deframe(cw[cw$var ==cur_column(),][-1])[as.character(.)], .)))

-output

  var1 var2 var3 var4
1    1    1    1    1
2    2    2    2    2
3    3    3    3    3
4   77    5   77   77

score 4 · Answer 2 · answered Jul 19 '21 at 21:30

4

if the values are independed of the column like in your example, then you can simply use this code:

df %>%
    dplyr::mutate(across(everything(), ~ ifelse(.x %in% c(4,6,97,777), 77, .x)))

answered Jul 19 '21 at 21:30

DPH

4,244
1
8
18

score 4 · Answer 3 · answered Jul 19 '21 at 21:32

Though not part of the tidyverse apply is the old standard function to apply a function to any value in a data.frame, essentially making this a one liner:

var1 <- c("1", "2", "3", "4")
var2 <- c("1", "2", "3", "5")
var3 <- c("1", "2", "3", "97")
var4 <- c("1", "2", "3", "777")

df <- data.frame(var1, var2, var3, var4)

var <- c("var1", "var2", "var3", "var4")
oldvalue <- c("4", "6", "97", "777")

new.df <- apply(df[, var], 1:2, function(x) if(x %in% oldvalue) 77 else x)

print(new.df)

score 3 · Answer 4 · answered Jul 19 '21 at 21:47

If you're dealing with factors ("1") and not numeric responses (1), the forcats package may be your friend.

This will recode anything not in the notForRecode string as "77"

notForRecode <- c("1", "2", "3") #  Add others as wanted
df <- df %>% 
  mutate(across(everything(), ~ fct_other(.x, keep = notForRecode, other_level = "77")))

score 3 · Answer 5 · answered Jul 19 '21 at 23:15

A base R option

setNames(list2DF(
    lapply(
        names(df),
        function(k) {
            with(
                cw,
                replace(df[[k]], df[[k]] == oldvalue[var == k], newvalue[var == k])
            )
        }
    )
), names(df))

gives

  var1 var2 var3 var4
1    1    1    1    1
2    2    2    2    2
3    3    3    3    3
4   77    5   77   77

score 0 · Answer 6 · answered Jul 20 '21 at 03:52

0

Using purrr::pmap_dfc -

purrr::pmap_dfc(cw, ~{
  tmp <- df[..1]
  tmp[tmp == ..2] <- ..3
  tmp
})

#  var1 var2 var3 var4
#1    1    1    1    1
#2    2    2    2    2
#3    3    3    3    3
#4   77    5   77   77

answered Jul 20 '21 at 03:52

Ronak Shah

377,200
20
156
213

Thanks for this answer. This is working well for me, but when I have a variable where two or more elements are being changed, it gives me two columns instead of changing both elements in the same column. I'm not very familiar with purrr, do you know if there's a way to solve that? Ex.: var1 <- c("1", "2", "3", "4") var2 <- c("1", "2", "3", "5") var3 <- c("1", "2", "3", "97") var4 <- c("1", "2", "3", "777") df <- data.frame(var1, var2, var3, var4) var <- c("var1", "var2", "var3", "var4", "var4") oldval <- c("4", "5", "97", "777", "3") newval <- c("77", "77", "77", "77", "98") – Quinterpret Jul 21 '21 at 15:34

score 0 · Answer 7 · answered Aug 17 '21 at 11:06

# Split-Apply-Combine: 
do.call(
  cbind, 
  Map(
    function(x, y){
    replace(
      x,
      x == y$oldvalue,
      y$newvalue
    )
  },
    split.default(
      df,
      seq_len(
        ncol(
          df
        )
      )
    ),
    split(
      cw, 
      seq_len(
        nrow(
          cw
        )
      )
    )
  )
)

Recoding several variables in for loop

7 Answers7