2

I have 2 set of strings. Char and Char2 for this example. I am trying to find if Char includes at least 2 words from Char2 (any two words can match). I have yet to get to the "at least 2 words" part, but I must first figure out the matching of any word in each string. Any help would be greatly appreciated.

I have tried using the stringr package a couple of different ways. Please see below. I tried using similar solutions to what Robert answered with in this question: Detect multiple strings with dplyr and stringr

shopping_list <- as.data.frame(c("good apples", "bag of apples", "bag of sugar", "milk x2"))
colnames(shopping_list) <- "Char"

shopping_list2 <- as.data.frame(c("good pears", "bag of sugar", "bag of flour", "sour milk x2"))
colnames(shopping_list2) <- "Char2"

shop = cbind(shopping_list , shopping_list2)
shop$Char = as.character(shop$Char)
shop$Char2 = as.character(shop$Char2)


# First attempt
sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))

# Second attempt
str_detect(shop$Char, paste(shop$Char2, collapse = '|'))

I get these results:

sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
  good apples bag of apples  bag of sugar       milk x2 
        FALSE         FALSE          TRUE         FALSE 


str_detect(shop$Char, paste(shop$Char2, collapse = '|'))
FALSE FALSE  TRUE FALSE

However I am looking for these results:

FALSE TRUE TRUE TRUE

1) FALSE because only 1 word matches 2) TRUE because "bag of" in both 3) TRUE because "bag of" in both 4) TRUE because "milk x2" in both

Eric Maxon
  • 43
  • 3

1 Answers1

0

Here is a function that could help

match_test <- function (string1, string2) {
  words1 <- unlist(strsplit(string1, ' '))
  words2 <- unlist(strsplit(string2, ' '))
  common_words <- intersect(words1, words2)
  length(common_words) > 1
}

Here is an example

string1 <- c("good apples" , "bag of apples", "bag of sugar", "milk x2")
string2 <- c("good pears" , "bag of sugar", "bag of flour", "sour milk x2")
vapply(seq_along(string1), function (k) match_test(string1[k], string2[k]), logical(1))
# [1] FALSE  TRUE  TRUE  TRUE
niko
  • 5,253
  • 1
  • 12
  • 32