0

I have a string which always contains unwanted text at the end. I would like to extract everything but the unwanted text.

text <- "my_text_and_unwanted_text"
output <- str_extract(text, ".*(?=<_and)")
output

I am hoping that ".*" matches all text that precedes anything with "_and". So the intended result is "my text" but I get "NA". I have reviewed a number of posts but having trouble finding examples that show how to match everything but the desired string.

marcel
  • 389
  • 1
  • 8
  • 21

2 Answers2

1

Try reversing the problem. Instead of extracting the text you want, remove the text you do not want:

str_remove(text, "_and.*")
[1] "my_text"
LMc
  • 12,577
  • 3
  • 31
  • 43
1

Another way to think of this operation is to replace the unwanted text with nothing rather than extract everything else. This is often simpler.

text <- "my_text_and_unwanted_text"
str_replace(text, "_and.*", "")
# [1] "my_text"

For the extracting approach, your attempt was very close. (?<= is for look-behind, you need (?= for look-ahead

str_extract(text, ".*(?=_and)")
# [1] "my_text"
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294