1

I have a dictionary with terms

terms <- c("hello world", "great job")
terms <- as.data.frame(terms)

, and i would like to search for the first match in additional data.frame which contains documents

doc <- c("i would like to say hello worlds", "hey friends hello world everyone", "i'm looking for a great job", "great job")
docs <- as.data.frame(doc)

desired outcome:

foundtext <- c("i would like to say hello worlds","i'm looking for a great job")
output <- cbind(terms, foundtext)

Thank you for your assistance!

vlad.rad
  • 1,055
  • 2
  • 10
  • 28
Dmitry Leykin
  • 485
  • 1
  • 7
  • 14

1 Answers1

0

This solution is pretty simple and works. As I said, I didn't use regular expressions for this.

doc <- c("i would like to say hello worlds", "hey friends hello world everyone", "i'm looking for a great job", "great job")
docs <- as.data.frame(doc)
docs$match <- "not found" #or just empty
for (i in terms){

    docs$new <- grepl(i, docs$doc, perl=TRUE)
    docs$match[docs$new=="TRUE"] <- i
    next

}
docs <- subset(docs,,1:2)
docs$dupl <- !duplicated(docs$match, fromLast=FALSE)
docs <- subset(subset(docs, dupl=="TRUE"),,1:2)
docs
vlad.rad
  • 1,055
  • 2
  • 10
  • 28