11

I am trying to write a custom function that will join two datasets using a quosures as the arguments in the "by = c()" portion of the left_join() function.

Here is my current attempt at the function, which fails at the "by = c(!!left_index = !!right_index))" portion. left_join expects these arguments to be quoted and quoting quosures nullifies the !!.

join_by_quosure <- function(data, left_index, var_to_impute, right_index){
  require(dplyr)

  left_index <- enquo(left_index)
  right_index <- enquo(right_index)
  var_to_impute <- enquo(var_to_impute)

  left_join(data, 
    data %>% select(!!right_index, !!var_to_impute),
    by = c(!!left_index = !!right_index))
}

I have written this working example below of how the function would work:

# join_by_quosure(data = mtcars, left_index = vs, var_to_impute = mpg, right_index = am)

left_join(mtcars, 
          mtcars %>% select(am, mpg),
          by = c("vs" = "am"))

If anyone can offer insight about how to call a quosure within the "by = c()" portion of the left_join() function I would be very grateful.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Joe
  • 3,217
  • 3
  • 21
  • 37

2 Answers2

15

The c() function doesn't support the rlang bangs so you'll have to take a more traditional approach to building your parameter. You can do

join_by_quosure <- function(data, left_index, var_to_impute, right_index){
  require(dplyr)

  left_index <- enquo(left_index)
  right_index <- enquo(right_index)
  var_to_impute <- enquo(var_to_impute)

  by = set_names(quo_name(right_index), quo_name(left_index))

  left_join(data, 
            data %>% select(!!right_index, !!var_to_impute),
            by = by)
}
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 2
    Very nice solution! But I notice that set_names() requires the right_index and left_index to be reverse ordered (i.e., right_index must be listed before left_index even though the reverse order would have been used in 'by = c("left_index" = "right_index")'. Can you shed any light on why set_names() requires the reverse order? – Joe Jan 25 '18 at 20:13
  • 2
    The parameter order for that function is simply values first, names second. It probably makes more sense when used in a pipe. But also the problem is really the way the join functions "abuse" named vector notation to specify linking columns. As with most dplyr functions, things just get uglier when you need to use them pragmatically rather than directly. – MrFlick Jan 25 '18 at 20:17
2

Another way of passing a column into a function where the join happens with "LHS" = "RHS" could look like this:

data("mtcars")

library(tidyverse)

function_left_join <- function(x) {

  mtcars %>% 
    left_join(mtcars, by = names(select(., {{x}})))

}

head(function_left_join(mpg))
#>    mpg cyl.x disp.x hp.x drat.x  wt.x qsec.x vs.x am.x gear.x carb.x cyl.y
#> 1 21.0     6    160  110   3.90 2.620  16.46    0    1      4      4     6
#> 2 21.0     6    160  110   3.90 2.620  16.46    0    1      4      4     6
#> 3 21.0     6    160  110   3.90 2.875  17.02    0    1      4      4     6
#> 4 21.0     6    160  110   3.90 2.875  17.02    0    1      4      4     6
#> 5 22.8     4    108   93   3.85 2.320  18.61    1    1      4      1     4
#> 6 22.8     4    108   93   3.85 2.320  18.61    1    1      4      1     4
#>   disp.y hp.y drat.y  wt.y qsec.y vs.y am.y gear.y carb.y
#> 1  160.0  110   3.90 2.620  16.46    0    1      4      4
#> 2  160.0  110   3.90 2.875  17.02    0    1      4      4
#> 3  160.0  110   3.90 2.620  16.46    0    1      4      4
#> 4  160.0  110   3.90 2.875  17.02    0    1      4      4
#> 5  108.0   93   3.85 2.320  18.61    1    1      4      1
#> 6  140.8   95   3.92 3.150  22.90    1    0      4      2
Jeniffen
  • 41
  • 2