2

I'm using mutate() with str_extract() to condense a string column in a dataframe to just a single key word column. My problem is that one of the strings contains two keywords, and it's the second one that's more important to me. The regex, though, always takes the first hit it finds to the alternatives I tell it to look for. Is there a way to change this?

MWE (without mutate()):

teststring <- "abcdef"
str_extract(teststring, "b|c|a")

I'd like to be able to find patterns in the order I choose, not what's first in the test string.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Lilith-Elina
  • 1,613
  • 4
  • 20
  • 31

2 Answers2

3

You may use either stringi::stri_extract_last or stringi::stri_extract_last_regex to get the last match:

stringi::stri_extract_last_regex('zzzayyyyc.', '[abc]')
stringi::stri_extract_last('zzzayyyyc.', regex=c('[abc]'))

See the R demo online, both return [1] "c".

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Building on Gki's answer in the comment, if you only want to return the 1st match of your chosen order. we can wrap the code with:

teststring <- "ccccbbbaaaaqwerty"

which.min(is.na((sapply(c("b","c","a"), stringr::str_extract, string=teststring))))

b 
1
Daniel O
  • 4,258
  • 6
  • 20