0
# example
a <- data.frame(name=c("A","B","C"), KW=c(201902,201904,201905),price=c(1.99,3.02,5.00))
b <- data.frame(KW=c(201903,201904,201904),price=c(1.98,3.00,5.00),name=c("a","b","c"))

I want to match a and b with fuzzy logic, using the variables KW and price. I want to allow a tolerance of +/- 1 for KW and a tolerance for +/- 0.02 in price.

The desired outcome should look like this:

name.x   KW.x price.x   KW.y price.y name.y
1    A 201902    1.99 201903    1.98      a
2    B 201904    3.02 201904    3.00      b
3    C 201905    5.00 201904    5.00      c

I would prefer to find a solution using the fuzzyjoin package. I tried so far using the fuzzy_inner_join function and specifying my desired tolrences for KW and price using the match_fun argument. However, I couldn't get it to work.

Looking for help, how to solve this problem.

jestor
  • 67
  • 5

1 Answers1

0

You can create a cartesian product of two dataframes using merge and then subset the rows which follow our required conditions.

subset(merge(a, b, by = NULL), abs(KW.x - KW.y) <= 1 & 
                               abs(price.x - price.y) <= 0.02)

#  name.x   KW.x price.x   KW.y price.y name.y
#1      A 201902    1.99 201903    1.98      a
#5      B 201904    3.02 201904    3.00      b
#9      C 201905    5.00 201904    5.00      c
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you @Ronak, just implemented it and it works like a charm :) – jestor Mar 27 '20 at 10:27
  • @ Ronak, if I were to add another variable to the matching code, which is a string and should be matched exactly, how would that look like? – jestor Apr 02 '20 at 10:28
  • In that case you can try merging with that variable in `by`, something like `subset(merge(a, b, by = 'variable'), abs(KW.x - KW.y) <= 1 & abs(price.x - price.y) <= 0.02) ` – Ronak Shah Apr 02 '20 at 10:36