0

I am trying to run this code :

main_df %>% 
  fuzzy_anti_join(secondary_df, match_fun = list(`==`, `%within%`),
                  by = c("ID","Date" = "Date_Interval"))

the issue is that it returns the following error : Error in dplyr::group_by(): ! Must group by variables found in .data. ✖ Column col is not found.

I already know why this is happening. The "Date" column is in the right table "secondary_df" and the "Date_Interval" is in the left table "main_df". So it is not finding "Date" on the left side and vice-versa.

However, i need to keep "main_df" as the left table and "secondary_df" as the right table. i obviously cannot simply switch my join variables like so : by = c("ID","Date_Interval"= "Date") because that would defeat the purpose and I want to match where Date is within the Date interval.

I have also tried this :

test_df <- main_df %>% 
  fuzzy_anti_join(secondary_df,match_fun = list(`==`, `.y %within% .x`),
                  by = c("ID","Date" = "Date_Interval"))

but it does not handle the match_fun correctly. I still have a feeling that there is a way to fix it by changing the %within% part of the match fun to switch the tables sides but i have not found it yet.

Please help!

jpsmith
  • 11,023
  • 5
  • 15
  • 36
marcelklib
  • 91
  • 5

1 Answers1

0

after trial and error, I found this to work:

main_df %>% fuzzy_anti_join(secondary_df, match_fun = list(`==`, function(x, y) y %within% x),
                                             by = c("ID","Date_Interval" = "Date"))

you cannot change the order of the tables, in the "by =" argument, which is why I had two switch Date_Interval & Date. However, you can create a function that will evaluate y within x instead of the other way around.

marcelklib
  • 91
  • 5