I am trying to join two tables using dplyr within a function, where one of the variable names is defined by an argument to the function. In other dplyr functions, there is usually a version available for non-standard evaluation, e.g. select
& select_
, rename
and rename_
, etc, but not for the _join
family. I found this answer, but I cannot get it to work in my code below:
df1 <- data.frame(gender = rep(c('M', 'F'), 5), var1 = letters[1:10])
new_join <- function(df, sexvar){
df2 <- data.frame(sex = rep(c('M', 'F'), 10), var2 = letters[20:1])
# initial attempt using usual dplyr behaviour:
# left_join(df, df2, by = c(sexvar = 'sex'))
# attempt using NSE:
# left_join(df, df2,
# by = c(eval(substitute(var), list(var = as.name(sexvar)))) = 'sex'))
# attempt using setNames:
# left_join(df, df2, by = setNames(sexvar, 'sex'))
}
new_join(df1, 'gender')
The first and second attempt give the error
Error: 'sexvar' column not found in rhs, cannot join
while the last attempt gives the error
Error: 'gender' column not found in lhs, cannot join,
which at least shows it knows I want the column gender
, but somehow doesn't see it as a column heading.
Can anyone point out where I am going wrong?