1

I have example data as follows:

library(fuzzyjoin)
a <- data.frame(x = c("season", "season", "season", "package", "package"), y = c("1","2", "3", "1","6"))


b <- data.frame(x = c("season", "seson", "seson", "package", "pakkage"), w = c("1","2", "3", "2","6"))

c <- data.frame(z = c("season", "seson", "seson", "package", "pakkage"), w = c("1","2", "3", "2","6"))

So the following runs fine:

d <- stringdist_left_join(a,b, by = "x", max_dist = 2)

But merging with a column with a different name is not allowed (note that the join is now a and c).

e <- stringdist_left_join(a,c, by = c("x", "z"), max_dist = 2)

I would like to tell stringdist_left_join to use two different column names to join by, like the last line of code it (e), but it does not seems to accept it.

Is there any solution to this (other than copying the column and giving it another name)?

Tom
  • 2,173
  • 1
  • 17
  • 44

1 Answers1

1

You can use = for two different column names. You can use the following code:

e <- stringdist_left_join(a,c, by = c("x" = "z"), max_dist = 2)

Output:

         x y       z w
1   season 1  season 1
2   season 1   seson 2
3   season 1   seson 3
4   season 2  season 1
5   season 2   seson 2
6   season 2   seson 3
7   season 3  season 1
8   season 3   seson 2
9   season 3   seson 3
10 package 1 package 2
11 package 1 pakkage 6
12 package 6 package 2
13 package 6 pakkage 6
Quinten
  • 35,235
  • 5
  • 20
  • 53
  • Thanks Quinten! So I guess then there's no way to feed the function a vector with the different names? – Tom Apr 16 '22 at 09:19
  • @Tom, No. You have to specify which column matches another column if they have different column names. – Quinten Apr 16 '22 at 09:24