Using a tidyeval column after join

Question

I have a function that joins data together, and then should take the average of a column.

Here I can join the data, but I am not sure how to average the x.x and x.y columns in a sufficiently generalized way

 library(dplyr) 

a <- tibble(id = 1:3, x = 4:6)
b <- tibble(id = 1:3, x = 16:18)


join_then_average <- function(df1, df2, var) {
  full_join(df1, df2, by = "id")  # i want to average x.x, and x.y
}

join_then_average(a, b)
#> # A tibble: 3 x 3
#>      id   x.x   x.y
#>   <int> <int> <int>
#> 1     1     4    16
#> 2     2     5    17
#> 3     3     6    18

Conceptually I want to write something like:

mutate({{var}} := rowMeans(c({{var}}.x, {{var}}.y), na.rm = T)

but this doesn't work. I'm not sure the best way to approach this question.

Will you have only one group of columns like `x.x` and `x.y` or there could be many like `y.x`, y.y` and `z.x` and `z.y` ? — Ronak Shah, Nov 04 '20 at 02:31

Ronak Shah · Answer 1 · 2020-11-04T03:03:16.120

1

You can select the columns that contains var in it and take rowMeans.

library(dplyr)

join_then_average <- function(df1, df2, var) {
  full_join(df1, df2, by = "id")  %>%
    mutate(x = rowMeans(select(., contains(var))))
}

join_then_average(a, b, 'x')

# A tibble: 3 x 4
#     id   x.x   x.y     x
#  <int> <int> <int> <dbl>
#1     1     4    16    10
#2     2     5    17    11
#3     3     6    18    12

edited Nov 04 '20 at 03:03

answered Nov 04 '20 at 02:34

Ronak Shah

377,200
20
156
213

I have a lot more columns in a,b so `select(., -id)` so feasible, and I won't know the column names either. If there something about the lines of `select(contains({{var}}))` I could do? – John-Henry Nov 04 '20 at 02:54
You are not using `var` in your function anywhere. Nor you are passing it in `join_then_average` function. What would be the value of `var` ? – Ronak Shah Nov 04 '20 at 02:56
Sorry for being unclear it would be x Here's the solution. In the function: ``` var_str = as_label(enquo(var)) ... mutate(x = rowMeans(select(., contains(var_str)))) ``` – John-Henry Nov 04 '20 at 03:00
1

Do you really need to pass `var` unquoted? You can also use `deparse` + `substitute` to convert var to string. – Ronak Shah Nov 04 '20 at 03:05
1

@John-Henry You can do `df %>% select({{ var }})` and then that arguments supports all the tidyselect features. If you have a string, you'd do `var = all_of(my_string)`. But you can also refer to unquoted names like in tidyselect, or use selection helpers. – Lionel Henry Nov 04 '20 at 08:35
@RonakShah Do you think it makes sense to update your reply along the lines above? It would make it more general. – Lionel Henry Nov 04 '20 at 08:37
@LionelHenry But `var` is a pattern and not an actual column name. John wants to pass `x` as `var` and it should select columns which has `x` in it i.e `x.x` and `y.x`. – Ronak Shah Nov 04 '20 at 08:41
Sorry I jumped too quickly to your reply, my bad. I agree it's better to take a string in that case, so your proposal makes sense to me. – Lionel Henry Nov 04 '20 at 10:34

Using a tidyeval column after join

1 Answers1