3

My second question at stack overflow so all tips are welcome :)

For clinical research I have to recode many dichotomous baseline characteristics that have several variations of "yes" and "no" in it.

Currently i am recoding these variables one by one but it takes many lines of code and the variations are quite similar among all the different variables. In case of unknown or NA i want to recode to 0.

example

library(dplyr)

A <- c("Yes", "y", "no", "n", "UK")
B <- c("yes", "Yes", "y", "no", "no")
C <- c("Y", "y", "n", "no", "uk")

#attempt 1 was to recode all variables one by one

A <- recode(A,  "Yes" = "yes", "y" = "yes", "n" = "no", "UK" = "no")
B <- recode (B, "Yes" = "yes", "y" = "yes")
C <- recode(C, "Y" = "yes", "y" = "yes", "n" = "no", "uk" = "no")

#attempt 2 was to use a list option on all vectors.

levels(A) <- list("yes"=c("Likely", "y", "Y", "Yes", "yes"), "no" = c("", "No", "UK", "no", "N", "n"))

I was wondering if there is a way could perform this list option on a list/vector that encompasses all A, B, C? Or maybe there is another way that i could recode these variables that is easier and more efficient?

Any help would be great :)

2 Answers2

1

If the vectors are of same length you can put them in dataframe or if they are of different length put them in a list and then use lapply to apply the same function for all of them. You can use forcats::fct_collapse to collapse multiple levels into one.

list_vec <- list(A, B, C)

list_vec <- lapply(list_vec, function(x) forcats::fct_collapse(x, 
            "yes"=c("Likely", "y", "Y", "Yes", "yes"), 
            "no" = c("", "No", "UK", "no", "N", "n", "uk")))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This worked perfectly! Thank you so much! I have newbie follow up question: what do you exactly mean with the fct_collapse function to collapse multiple levels to one? There are two levels right? ("yes" and "no") right? – Julius Heemelaar Oct 20 '20 at 09:11
  • Yes but in your vector there are multiple values that can have "yes" value like "Likely", "y", "Y", "Yes", "yes". That is why I said collapse multiple levels into one. – Ronak Shah Oct 20 '20 at 09:43
1

You can use grepl to select yes or no from a vector.

c("0","no","yes")[1 + grepl("^no?", A, TRUE) + 2*grepl("^ye?s?", A, TRUE)]
#[1] "yes" "yes" "no"  "no"  "0"  

To make this for many vectors you can use a loop like:

for(x in c("A","B","C")) {
  assign(x, c("0","no","yes")[1 + grepl("^no?", get(x), TRUE) +
                              2*grepl("^ye?s?", get(x), TRUE)])
}
A
#[1] "yes" "yes" "no"  "no"  "0"  
B
#[1] "yes" "yes" "yes" "no"  "no" 
C
#[1] "yes" "yes" "no"  "no"  "0"  
GKi
  • 37,245
  • 2
  • 26
  • 48