My question is an extension of this one: How to extract sentences containing specific person names using R
I'll write the relevant part here (slightly edited for the sake of this question):
> sentences
[1] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin"
[2] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21 with the help of Martin Luther"
[3] " He studied the Scripture, especially of Paul, and Evangelical doctrine"
[4] " He was present at the disputation of Leipzig (1519) as a spectator, but participated by his comments."
[5] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
toMatch <- c("Martin Luther", "Paul", "Melanchthon")
The answer provided gives the sentences that match each name:
foo<-function(Match){c(Match,sentences[grep(Match,sentences)])}
> lapply(toMatch,foo)
[[1]]
[1] "Martin Luther"
[2] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin"
[3] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21 with the help of Martin Luther"
[[2]]
[1] "Paul"
[2] " He studied the Scripture, especially of Paul, and Evangelical doctrine"
[[3]]
[1] "Melanchthon"
[2] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21 with the help of Martin Luther"
[3] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
lapply(toMatch,foo)
gives a list of toMatch
elements and apply each one to the function foo
, which search for matches in the sentences with grep
(returning the position of the sentences vector that match): sentences[grep(Match,sentences)]
.
My question is, instead of returning every sentence that match the elements of the toMatch
vector, how could we have a vector with every sentence and then look for the names that match each one (i.e: the other way around, I know it's a bit confusing, the output would be this):
[1] "Martin Luther"
[2] "Melanchthon","Martin Luther"
[3] "Paul"
[4] NA #Or maybe this row doesn't exists, it's the same for me
[5] "Melanchthon"
Could this be done altering the result already provided or maybe this would be easier using a different function and lapply(sentences,FUNCTION)
?