4

I want to find a word in different columns and mutate it in a new column.

"data" is an example and "goal" is what I want. I tried a lot but I didn't get is work.

 library(dplyr)
 library(stringr)

 data <- tibble(
    component1 = c(NA, NA, "Word", NA, NA, "Word"),
    component2 = c(NA, "Word", "different_word", NA, NA, "not_this")
    )

 goal <- tibble(
    component1 = c(NA, NA, "Word", NA, NA, "Word"),
    component2 = c(NA, "Word", "different_word", NA, NA, "not_this"),
    component = c(NA, "Word", "Word", NA, NA, "Word")
    )


not_working <- data %>%
     mutate(component = across(starts_with("component"), ~ str_extract(.x, "Word")))
user438383
  • 5,716
  • 8
  • 28
  • 43

2 Answers2

7

For your provided data structure we could use coalesce:

library(dplyr)

data %>% 
  mutate(component = coalesce(component1, component2))
component1 component2     component
  <chr>      <chr>          <chr>    
1 NA         NA             NA       
2 NA         Word           Word     
3 Word       different_word Word     
4 NA         NA             NA       
5 NA         NA             NA       
6 Word       not_this       Word     
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 2
    this is beautiful! – LDT Sep 16 '22 at 09:21
  • 3
    This does not work in the case where component1 is "different_word" and component2 is "Word". Although very succinct, I'm not sure this answer would work in every case. – Maël Sep 16 '22 at 09:23
  • 1
    That is true, therefore my reminder in my answer "for this data structure". – TarJae Sep 16 '22 at 09:24
  • 1
    Thanks for the solution. I thought about this but I needed a solution with if_any or across because I import data and the number of columns with "Word" differs. Only the name of the column always starts with "components" – Jan von Heynitz Sep 16 '22 at 09:59
3

With if_any and str_detect:

library(dplyr)
library(stringr)
data %>% 
  mutate(component = ifelse(if_any(starts_with("component"), str_detect, "Word"), "Word", NA))

output

  component1 component2     component
  <chr>      <chr>          <chr>    
1 NA         NA             NA       
2 NA         Word           Word     
3 Word       different_word Word     
4 NA         NA             NA       
5 NA         NA             NA       
6 Word       not_this       Word   

If you wanna stick to str_extract, this would be the way to go:

data %>%
  mutate(across(starts_with("component"), str_extract, "Word", 
         .names = "{.col}_extract")) %>% 
  mutate(component = coalesce(component1_extract, component2_extract),
         .keep = "unused")
# A tibble: 6 × 3
  component1     component2     component
  <chr>          <chr>          <chr>    
1 NA             NA             NA       
2 NA             Word           Word     
3 Word           different_word Word     
4 NA             NA             NA       
5 NA             NA             NA       
6 different_word Word           Word     
Maël
  • 45,206
  • 3
  • 29
  • 67