extract string from multiple columns in new column

Question

I want to find a word in different columns and mutate it in a new column.

"data" is an example and "goal" is what I want. I tried a lot but I didn't get is work.

 library(dplyr)
 library(stringr)

 data <- tibble(
    component1 = c(NA, NA, "Word", NA, NA, "Word"),
    component2 = c(NA, "Word", "different_word", NA, NA, "not_this")
    )

 goal <- tibble(
    component1 = c(NA, NA, "Word", NA, NA, "Word"),
    component2 = c(NA, "Word", "different_word", NA, NA, "not_this"),
    component = c(NA, "Word", "Word", NA, NA, "Word")
    )


not_working <- data %>%
     mutate(component = across(starts_with("component"), ~ str_extract(.x, "Word")))

score 7 · Answer 1 · answered Sep 16 '22 at 09:17

7

For your provided data structure we could use coalesce:

library(dplyr)

data %>% 
  mutate(component = coalesce(component1, component2))

component1 component2     component
  <chr>      <chr>          <chr>    
1 NA         NA             NA       
2 NA         Word           Word     
3 Word       different_word Word     
4 NA         NA             NA       
5 NA         NA             NA       
6 Word       not_this       Word

answered Sep 16 '22 at 09:17

TarJae

72,363
6
19
66

2

this is beautiful! – LDT Sep 16 '22 at 09:21
3

This does not work in the case where component1 is "different_word" and component2 is "Word". Although very succinct, I'm not sure this answer would work in every case. – Maël Sep 16 '22 at 09:23
1

That is true, therefore my reminder in my answer "for this data structure". – TarJae Sep 16 '22 at 09:24
1

Thanks for the solution. I thought about this but I needed a solution with if_any or across because I import data and the number of columns with "Word" differs. Only the name of the column always starts with "components" – Jan von Heynitz Sep 16 '22 at 09:59

Maël · Accepted Answer · 2022-09-16T09:57:31.890

With if_any and str_detect:

library(dplyr)
library(stringr)
data %>% 
  mutate(component = ifelse(if_any(starts_with("component"), str_detect, "Word"), "Word", NA))

output

  component1 component2     component
  <chr>      <chr>          <chr>    
1 NA         NA             NA       
2 NA         Word           Word     
3 Word       different_word Word     
4 NA         NA             NA       
5 NA         NA             NA       
6 Word       not_this       Word

If you wanna stick to str_extract, this would be the way to go:

data %>%
  mutate(across(starts_with("component"), str_extract, "Word", 
         .names = "{.col}_extract")) %>% 
  mutate(component = coalesce(component1_extract, component2_extract),
         .keep = "unused")

# A tibble: 6 × 3
  component1     component2     component
  <chr>          <chr>          <chr>    
1 NA             NA             NA       
2 NA             Word           Word     
3 Word           different_word Word     
4 NA             NA             NA       
5 NA             NA             NA       
6 different_word Word           Word

extract string from multiple columns in new column

2 Answers2