2

I want to loop over many columns and under certain conditions, replace values. For example, if disease=0 and treatment=1, replace treatment cell with 99.

Data:

df <- data.frame(id=1:5,
                 disease1=c(1,1,0,0,0),
                 treatment1=c(1,0,1,0,0),
                 outcome1=c("survived", "died", "survived", NA,NA),
                 disease2=c(1,1,0,0,0),
                 treatment2=c(1,0,1,0,0),
                 outcome2=c("survived", "died", "survived", NA,NA))

> df
  id disease1 treatment1 outcome1 disease2 treatment2 outcome2
1  1        1          1 survived        1          1 survived
2  2        1          0     died        1          0     died
3  3        0          1 survived        0          1 survived
4  4        0          0     <NA>        0          0     <NA>
5  5        0          0     <NA>        0          0     <NA>

For a single column, case_when works well:

df %>% mutate(treatment=case_when((disease1!=1&treatment1==1)~99, TRUE~treatment1))

For multiple columns, the following works in base R:

for(i in 1:2) {
  df[,paste0("treatment",i)] <- ifelse(df[,paste0("disease",i)]!=1&df[,paste0("treatment",i)]==1,99, df[,paste0("treatment",i)])
}

I am looking for a way to do this all in tidyverse and I am having trouble finding the right recipe. Thank you in advance.

EML
  • 615
  • 4
  • 14

2 Answers2

1

Maybe consider putting in long form with pivot_longer, then would be easier to mutate across multiple columns. This would be a "tidier" approach if all disease should be together in one column (and same for treatment in 1 column, and outcome in 1 column).

library(tidyverse)

df %>%
  pivot_longer(cols = -id, names_to = c(".value", "number"), names_pattern = "(\\w+)(\\d+)") %>%
  mutate(treatment = ifelse(disease == 0 & treatment == 1, 99, treatment))
Ben
  • 28,684
  • 5
  • 23
  • 45
1

An option with names_sep in pivot_longer with case_when

library(dplyr)
library(tidyr)
pivot_longer(df, cols = -id, names_to = c('.value', 'number'), 
     names_sep="(?<=[a-z])(?=[0-9])") %>% 
    mutate(treatment = replace(treatment, !disease & treatment == 1, 99))
# A tibble: 10 x 5
#      id number disease treatment outcome 
#   <int> <chr>    <dbl>     <dbl> <chr>   
# 1     1 1            1         1 survived
# 2     1 2            1         1 survived
# 3     2 1            1         0 died    
# 4     2 2            1         0 died    
# 5     3 1            0        99 survived
# 6     3 2            0        99 survived
# 7     4 1            0         0 <NA>    
# 8     4 2            0         0 <NA>    
# 9     5 1            0         0 <NA>    
#10     5 2            0         0 <NA>    
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This does the trick. I was hoping to find a way that maintains the wide structure and column names since there are several hundred columns involved. – EML Jun 09 '20 at 19:51