I am trying to search a large text in R for keywords. Once I find one, I want to extract the 1 sentence before and after that keyword (including the sentence with the keyword in it). Ideally, I would like to be able to change this code to extract up to 3 sentences around the keyword. Sample data below.
text <- "This is an article about random things. Usually, there are a few sentences that are irrelevant to what I am interested in. Then in the middle, there is a sentence that I want to extract. Water quality is a serious concern in Akron, Ohio. It can impact ecological systems and human health. Jon Doe is a key player in this realm. Then the article goes on talking about something else that I don't care about."
keywords <- c("water quality", "health")
So with the text above, I want to search the text for "water quality" and "health" and when there is a match, I want to extract from "Then in the middle there is..." to "Jon Doe is a key player in this realm."
Finally, I want to repeat this over a number of rows with each row having its own text.
I've looked into using stringr/regex but it's not giving me what I want- I can't pull the full sentences. Any ideas?
Code I've tried:
str_extract_all(text,paste0("([^\\s+\\s){5}",keywords,"(\\s[^\\s]+){5}"))
-> that gets me a few words on either side
gsub(".*?([^\\.]*('water quality'|health)[^\\.]*).*","\\1", text, ignore.case = TRUE)
-> close also