0

Example data:

example <- data.frame(matrix(NA, ncol=4, nrow=4))
names(example) <- c("reason_code", "d_01", "d_02", "d_03")
example$reason_code <- c("d_01, d_03", "d_01", "d_02", "d_02, d_03")

Giving:

  reason_code  d_01 d_02 d_03  
1  d_01, d_03   NA   NA   NA  
2        d_01   NA   NA   NA  
3        d_02   NA   NA   NA  
4  d_02, d_03   NA   NA   NA  

The reason_code column sometimes contains more than one reason code. I have created a separate column for each reason code and want to search for the column name in the reason_code column and return a logical to indicate its presence, as in the following:

library(stringr)

example$d_01 <- str_detect(example$reason_code, "d_01")

Giving:

  reason_code  d_01 d_02 d_03  
1  d_01, d_03  TRUE   NA   NA  
2        d_01  TRUE   NA   NA  
3        d_02 FALSE   NA   NA  
4  d_02, d_03 FALSE   NA   NA  

My real data has many more reason codes and as such would like to know how I would use a function to iterate over each of the columns, searching for the column name in each row of the reason_code column and returning TRUE or FALSE before moving on to the next column?

P.s. I am told by the stackoverflow A.I. that my title is likely to get downvoted or closed, however I have no idea how to explain what I'm after succinctly and accurately. I hope the example makes it clear.

www
  • 38,575
  • 12
  • 48
  • 84

1 Answers1

0

We can use lapply to loop through the columns and apply str_detect to check if string exists.

library(stringr)

example[, -1] <- lapply(names(example[, -1]), function(x) str_detect(example$reason_code, x))
example
#   reason_code  d_01  d_02  d_03
# 1  d_01, d_03  TRUE FALSE  TRUE
# 2        d_01  TRUE FALSE FALSE
# 3        d_02 FALSE  TRUE FALSE
# 4  d_02, d_03 FALSE  TRUE  TRUE

We can also begin by just using the first column. Here I demonstrate how to create the desired output (example4) from only the first column (example2).

example2 <- example[, 1, drop = FALSE]
#   reason_code
# 1  d_01, d_03
# 2        d_01
# 3        d_02
# 4  d_02, d_03

library(dplyr)
library(tidyr)

example3 <- example2 %>% mutate(ID = 1:n()) 

example4 <- example3 %>%
  left_join(example3 %>%
              separate_rows(reason_code, sep = ", ") %>%
              mutate(Flag = TRUE) %>%
              spread(reason_code, Flag, fill = FALSE),
            by = "ID") %>%
  select(-ID)
example4
#   reason_code  d_01  d_02  d_03
# 1  d_01, d_03  TRUE FALSE  TRUE
# 2        d_01  TRUE FALSE FALSE
# 3        d_02 FALSE  TRUE FALSE
# 4  d_02, d_03 FALSE  TRUE  TRUE
www
  • 38,575
  • 12
  • 48
  • 84
  • The first option provided the solution to my immediate problem. The second solution would be preferable, given that I did use quite an inelegant method to add the columns for each reason code, however given the time pressure for my current task I haven't had the opportunity to try to make it work. I will update when I do. – James McDonell Mar 07 '18 at 23:57