I would like to create a function that merges two different tables, using two different columns from each data set.
df_1 <- data.frame(state_symbol = c('MG', 'SP', 'BA'),
city = c('Sao Paulo', 'Sao Paulo', 'Brumado'),
collected_data = c('red', 'green', 'blue'))
df_2 <- data.frame(S_Symb = c('MG', 'BA', "SP"),
municip = c('Sao Paulo', 'Brumado', 'Sao Paulo'),
population = c(123, 456, 789))
left_join(x = df_1,
y = df_2,
by = c("state_symbol" = "S_Symb",
"city" = "municip"))`
This produces the correct result:
state_symbol city collected_data population
1 MG Sao Paulo red 123
2 SP Sao Paulo green 789
3 BA Brumado blue 456
Now I want to use this code inside a function. I tried the following:
my_join <- function(tab_1,
tab_2,
df_1_city_col,
df_1_state_col,
df_2_city_col,
df_2_state_col) {
output <- dplyr::left_join(x = tab_1,
y = tab_2,
by = c({{df_1_city_col}} = {{df_2_city_col}},
{{df_1_state_col}} = {{df_2_state_col}}))
return(output)
}
But that produced the following error:
> my_join <- function(tab_1,
+ tab_2,
+ df_1_city_col,
+ df_1_state_col,
+ df_2_city_col,
+ df_2_state_col) {
+
+ output <- dplyr::left_join(x = tab_1,
+ y = tab_2,
+ by = c({{df_1_city_col}} = {{df_2_city_col}},
Error: unexpected '=' in:
" y = tab_2,
by = c({{df_1_city_col}} ="
Apparently, functions don't deal well with the notation c(something = something_else)
. The =
inside that parenthesis is too much for it and it instantaneously produces an error no matter what I try.
Just for reference, my function call would look like this:
my_join(tab_1 = df_1,
tab_2 = df_2,
df_1_city_col = 'city',
df_1_state_col = 'state_symbol',
df_2_city_col = 'municip',
df_2_state_col = 'S_Symb')
I saw some other similar answered questions, but couldn't find one where the join needs to occur between two columns from each input table.
Any help will be appreciated. Thanks.