1

Tibble A contains keys/patterns that may be found in tibble B. My goal is to identify keys/patterns that were not found in tibble B.

I want to anti-join two tables by a matching pattern:

A <- tibble( colA = c("B12", "B19", "B202", "B87", "B61", "B55") )
B <- tibble( colB = c("aaB87aa", "bbbbB55", "B202cccc") )

I found that function from fuzzyjoin should work but I can't use it properly:

test <- regex_anti_join(A, B, by = c(colB = "colA"))

EDIT

At the end I would like to have something like this:

test
1 B12
2 B19
3 B61
amk
  • 349
  • 1
  • 9

2 Answers2

2

regex_anti_join(A, B, by = c(colA = colB)) performs a str_detect of colB inside colA. In your case this return nothing, hence the empty result.

What you need is to do the inverse: search colA inside colB.

This can be achieved specifying the matching function to the fuzzy_join function that is used by regex_anti_join:

library(fuzzyjoin)

# This is exactly the inverse of the match_fun used inside regex_anti_join
match_fun <- function(v1, v2) {
  stringr::str_detect(v2, stringr::regex(v1))
}

fuzzy_join(A, B, by = c("colA" = "colB"), match_fun=match_fun, mode = 'anti')
#> # A tibble: 3 x 1
#>   colA 
#>   <chr>
#> 1 B12  
#> 2 B19  
#> 3 B61

Created on 2020-07-17 by the reprex package (v0.3.0)

GGamba
  • 13,140
  • 3
  • 38
  • 47
1

One dplyr option given the example data could be:

A %>%
 filter(!colA %in% gsub("[a-z]", "", B$colB))

  colA 
  <chr>
1 B12  
2 B19  
3 B61  
tmfmnk
  • 38,881
  • 4
  • 47
  • 67