0

I'm trying to adapt the inner join feature of the fuzzyjoin library.

The code:

JoinedRecs <- DataToUse1 %>%
    stringdist_inner_join(DataToUse2, by = c(Full.Name1 = "Full.Name2"), max_dist = 2)

seems to work when I hard-code the variables in the "by = " clause.

However, I want to use variables, where:

Column1 <- "Full.Name1"
Column2 <- "Full.Name2"

I've tried a number of variations on possible syntax, but I always get the same error message:

Error: Must group by variables found in .data.

  • Column col is not found.

If someone could inform me what the right code is for "by = " clause using variables rather than hard-coding the names, I would be ever-so grateful.

Thanks!

Bloxx
  • 1,495
  • 1
  • 9
  • 21

1 Answers1

0

We can use setNames to create a named vector in by

library(fuzzyjoin)
JoinedRecs <- DataToUse1 %>%
     stringdist_inner_join(DataToUse2, 
      by = setNames(Column2, Column1), max_dist = 2)

-reproducible example

> iris2 <- data.frame(Species2 = 'setosa', value = 1)
> Column1 <- 'Species'
> Column2 <- 'Species2'
> stringdist_inner_join(head(iris), iris2, 
       by = setNames(Column2, Column1), max_dist = 2)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Species2 value
1          5.1         3.5          1.4         0.2  setosa   setosa     1
2          4.9         3.0          1.4         0.2  setosa   setosa     1
3          4.7         3.2          1.3         0.2  setosa   setosa     1
4          4.6         3.1          1.5         0.2  setosa   setosa     1
5          5.0         3.6          1.4         0.2  setosa   setosa     1
6          5.4         3.9          1.7         0.4  setosa   setosa     1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you. I tried your suggestion and it seems to work slickly. I hadn't seen that approach listed in any of the documentation or examples I had seen, so I greatly appreciate your help. – ULandreman Nov 13 '21 at 00:24