I am trying to filter out relevant rowas based on the presence or existence of a string or part/element of a string in R. Following is the example:
colA colb flag
New York Metropolitan Area New York Yes
New York Metropolitan Area York Yes
New York Metropolitan Area New York Area Yes
New York Metropolitan Area Los Angeles No
Things I have tried till now:
- Where 2 different dataframes are present
df1<- df1 %>% fuzzy_inner_join(df2, by = c("colA" = "colB"), match_fun = str_detect)
This option fails due to paranthesis and other special characters, cleaning them all up also did not help.
- I joined the 2 dataframes based on an upper level hierarchay to limit the rows and created a dataframe df
df[, "lookup"] <- gsub(" ", "|", df[,"colB"])
df[,"flag"] <- mapply(grepl, df[,"lookup"], df[,"colA"])
Results not satisfactory as only limted rows are filtered.
Thank you in advance.