2

This is only pseudo code, but let's assume I have 22 columns named from Q46x47_1 to Q46x47_22.

I now want to apply the following recode logic:

if Q46x47_1=0 Q46x47_2=0. 
if Q46x47_3=0 Q46x47_4=0. 
if Q46x47_5=0 Q46x47_6=0. 
if Q46x47_7=0 Q46x47_8=0. 
if Q46x47_9=0 Q46x47_10=0. 
if Q46x47_11=0 Q46x47_12=0. 
if Q46x47_13=0 Q46x47_14=0. 
if Q46x47_15=0 Q46x47_16=0. 
if Q46x47_17=0 Q46x47_18=0. 
if Q46x47_19=0 Q46x47_20=0. 
if Q46x47_21=0 Q46x47_22=0.

Of course, I don't want to type this by hand. So I was wondering if there's an elegant way to bulk recode this. I know of across, but I don't see a way how to pass each second variable with the increased suffix number.

So I started with:

df %>%
  mutate(across(num_range(prefix = "Q46x47_", range = seq(1, 21, 2)), ~if_else(. == 0, ..., .)))

My problem is (given if_else could be used here) how to specify the ... part in the if_else command where I need to pass the LHS column names.

Any ideas?


Short reproducible example with a subset of columns:

df <- data.frame(Q46x47_1 = c(1, 2, 0, 1, 0),
                 Q46x47_2 = c(1, 2, 3, 1, 0),
                 Q46x47_3 = c(1, 2, 0, 1, 0),
                 Q46x47_4 = c(1, 2, 3, 1, 0),
                 Q46x47_5 = c(1, 2, 0, 1, 0),
                 Q46x47_6 = c(1, 2, 3, 1, 0))
deschen
  • 10,012
  • 3
  • 27
  • 50

1 Answers1

3

In base R, this is more easier

nm1 <- paste0("Q46x47_", seq(1, 21, by = 2))
nm2 <- paste0("Q46x47_", seq(2, 22, by = 2))
i1 <- df[nm1] == 0
df[nm2][i1] <- 0

For the example provided

nm1 <- paste0("Q46x47_", seq(1, 5, by = 2))
nm2 <- paste0("Q46x47_", seq(2, 6, by = 2))

If we need a dplyr option, an opiton is to get the column name when we loop across the even columns specified in 'nm2', then extract the suffix digits at the end, convert to numeric, subtract 1 to create the odd column name, get the value of the column and do the replacement

library(dplyr)
library(stringr)
df %>%
   mutate(across(all_of(nm2), ~
                {
             tmp <- cur_column()
             replace(., get(str_c('Q46x47_', 
            as.integer(str_extract(tmp, "\\d+$"))-1)) == 0, 0)

           }))

-output

# Q46x47_1 Q46x47_2 Q46x47_3 Q46x47_4 Q46x47_5 Q46x47_6
#1        1        1        1        1        1        1
#2        2        2        2        2        2        2
#3        0        0        0        0        0        0
#4        1        1        1        1        1        1
#5        0        0        0        0        0        0
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Small rep. example provided with only 6 columns. Your code seems to work with it. Thanks. – deschen Feb 16 '21 at 21:08
  • 1
    OK, I see how the tidyverse approach is quite involved, so I might go with the base R solution. But good to see that it can be done. – deschen Feb 16 '21 at 21:17