3

I am trying to join two tables using dplyr within a function, where one of the variable names is defined by an argument to the function. In other dplyr functions, there is usually a version available for non-standard evaluation, e.g. select & select_, rename and rename_, etc, but not for the _join family. I found this answer, but I cannot get it to work in my code below:

df1 <- data.frame(gender = rep(c('M', 'F'), 5), var1 = letters[1:10])

new_join <- function(df, sexvar){

  df2 <- data.frame(sex = rep(c('M', 'F'), 10), var2 = letters[20:1])

  # initial attempt using usual dplyr behaviour:
  # left_join(df, df2, by = c(sexvar = 'sex'))

  # attempt using NSE:
  # left_join(df, df2, 
  #           by = c(eval(substitute(var), list(var = as.name(sexvar)))) = 'sex'))

  # attempt using setNames:
  # left_join(df, df2, by = setNames(sexvar, 'sex'))

}

new_join(df1, 'gender')

The first and second attempt give the error

Error: 'sexvar' column not found in rhs, cannot join

while the last attempt gives the error

Error: 'gender' column not found in lhs, cannot join,

which at least shows it knows I want the column gender, but somehow doesn't see it as a column heading.

Can anyone point out where I am going wrong?

Community
  • 1
  • 1
Jonny
  • 2,703
  • 2
  • 27
  • 35

2 Answers2

4

Try:

df1 <- data.frame(gender = rep(c('M', 'F'), 5), var1 = letters[1:10])

new_join <- function(df, sexvar){

  df2 <- data.frame(sex = rep(c('M', 'F'), 10), var2 = letters[20:1])

  join_vars <- c('sex')
  names(join_vars) <- sexvar

  left_join(df, df2, by = join_vars)
}

new_join(df1, 'gender')

I'm sure there's a more elegant way of getting this to work using lazy evaluation, etc., but this should get you up-and-running in the meantime.

Alex Ioannides
  • 1,204
  • 9
  • 10
  • Yes, it did, thank you very much. I'll wait and see if someone answers with such a 'more elegant way', otherwise I'll accept your answer. Thanks again – Jonny Jul 21 '16 at 12:51
2

A oneliner in your block can look like this (which is similar to your last attempt)

left_join(df, df2, by = structure("sex", names = sexvar))

It is also possible to extend this to two varialbes

left_join(df, df2, by = structure(sexvarDF1, names = sexvarDF2))
Drey
  • 3,314
  • 2
  • 21
  • 26
  • Thanks for that - I didn't know that function. Based on Alex's answer, I've already incorporated the two variables in a single line as: `left_join(df, df2, by = setNames(c('sex', 'age'), c(sexvar, agevar)))`, which seems to do a similar task – Jonny Jul 22 '16 at 10:46