This question is an extension of this one: Find the names contained in each sentence (not the other way around)
I'll write the relevant part here. From this:
> sentences
[1] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin"
[2] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21 with the help of Martin Luther"
[3] " He studied the Scripture, especially of Paul, and Evangelical doctrine"
[4] " He was present at the disputation of Leipzig (1519) as a spectator, but participated by his comments."
[5] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
toMatch <- c("Martin Luther", "Paul", "Melanchthon")
We obtained this result:
library(stringr)
lst <- str_extract_all(sentences, paste(toMatch, collapse="|"))
lst[lengths(lst)==0] <- NA
lst
#[[1]]
#[1] "Martin Luther"
#[[2]]
#[1] "Melanchthon" "Martin Luther"
#[[3]]
#[1] "Paul"
#[[4]]
#[1] NA
#[[5]]
#[1] "Melanchthon"
But for a large toMatch
vector, concatenating its values with the OR operator might not be very efficient. So my question is, how can be the same result be obtained using a function or a loop? Maybe this way it can be used a regular expression like \<
or \b
aroung the toMatch
values so the system only looks for the whole words instead of strings.
I've tried this but don't know how to save the matches in lst
to get the same result as above.
for(i in 1:length(sentences)){
for(j in 1:length(toMatch)){
lst<-str_extract_all(sentences[i], toMatch[j])
}}