I'm trying to find all strings with a combination of words/sentences with other words separating them but with a fixed limit.
Example : I want the combination of "bought" and "watch" but with, at maximum, 2 words separating them.
- I bought a beautiful and shiny watch -> not ok because there is 4 words between "bought" and "watch" ("a beautiful and shiny")
- I bought a shiny watch -> ok because there is 2 words between "bought" and "watch" ("a shiny")
I haven't found anything close to what I wanted on R.
To find simple words/sentences in strings I'm using str_extract_all
from stringr
as here :
my_analysis <- str_c("\\b(", str_c(my_list_of_words_and_sentences, collapse="|"), ")\\b")
df$words_and_sentences_found <- str_extract_all(df$my_strings, my_analysis)