This is an extension of my previous post.
I have the following dataframes (df1 and df2) that I'm trying to merge:
year <- c("2002", "2002", "1999", "1999", "1997", "2002")
state <- c("TN", "TN", "AL", "AL", "CA", "TN")
name <- c("Molly Homes, Jane Doe", "Sally", "David", "Laura", "John", "Kate")
df1 <- data.frame(year, state, name)
year <- c("2002", "1999")
state <- c("TN", "AL")
versus <- c("Homes (v. Vista)", "@laura v. dAvid")
df2 <- data.frame(year, state, versus)
And I df4 is my ideal output:
year <- c("2002", "2002", "1999", "1999", "1997", "2002")
state <- c("TN", "TN", "AL", "AL", "CA", "TN")
name <- c("Molly Homes, Jane Doe", "Sally", "David", "Laura", "John", "Kate")
versus <- c("Homes (v. Vista)", "@ george v. SALLY", "@laura v. dAvid", "@laura v. dAvid", NA, NA)
df4 <- data.frame(year, state, name, versus)
The kind responders on the last post suggested this (and a variation):
library(dplyr)
df3 <- left_join(df1,df2, by=c("year","state")) %>%
rowwise() %>%
mutate(versus:=if_else(grepl(name,versus,ignore.case=T), versus,as.character(NA)))
The problem with the above code is that it doesn't match subsets. Ideally, I'd like grepl(x, y) to match each other, vice versa. If x is in y and/or y is in x, then it's TRUE and results in the value in the "versus" column.