I have understood that fuzzyjoin::difference will join two tables based on absolute difference between columns. Is there an R function that will join tables based on relative/percentage differences? I could do so using a full_join() + filter() but I suspect there is a more straightforward way.
Minimal example as follows:
library(tidyverse)
library(fuzzyjoin)
df_1 <- tibble(id = c("wombat", "jerry", "akow"), scores = c(10, 50, 75))
df_2 <- tibble(id= c("wombat", "jerry", "akow"), scores = c(14, 45, 82))
# joining based on absolute difference
difference_full_join(df_1, df_2,
by=c("scores"),
max_dist= 5,
distance_col = "abs_diff" )
# A tibble: 4 x 5
id.x scores.x id.y scores.y abs_diff
<chr> <dbl> <chr> <dbl> <dbl>
1 wombat 10 wombat 14 4
2 jerry 50 jerry 45 5
3 akow 75 NA NA NA
4 NA NA akow 82 NA
## joining based on relative difference (setting 10% as a threshold)
full_join(df_1, df_2, "id") %>%
dplyr::filter( (abs(scores.x - scores.y)/scores.x) <=0.10)
# A tibble: 2 x 3
id scores.x scores.y
<chr> <dbl> <dbl>
1 jerry 50 45
2 akow 75 82