Problem description: I'm currently extracting names from a book series. Many characters will go by nicknames, parts of names, or titles. I have a list of names that I'm using as a pattern on all of the data. The problem is that I'm getting multiple matches for full names and the parts of names. There are a total of 3000 names and variations of names that I'm running through a lot of text. The names are currently extracted in order from longest strings to shortest.
Question:
How can I ensure that after a pattern is extracted, that whatever text it matches is then removed from the string?
What I get:
str_extract("Mr Bean and friends", pattern = fixed(c("Mr Bean", "Bean", "Mr")))
[1] "Mr Bean" "Bean" "Mr"
What I want: (I know that I can't achieve this only using str_extract() or one line of code)
str_extract("Mr Bean and friends", pattern = fixed (c("Mr Bean", "Bean", "Mr")))
[1] "Mr Bean" NA NA