0

I have a function that joins data together, and then should take the average of a column.

Here I can join the data, but I am not sure how to average the x.x and x.y columns in a sufficiently generalized way

 library(dplyr) 

a <- tibble(id = 1:3, x = 4:6)
b <- tibble(id = 1:3, x = 16:18)


join_then_average <- function(df1, df2, var) {
  full_join(df1, df2, by = "id")  # i want to average x.x, and x.y
}

join_then_average(a, b)
#> # A tibble: 3 x 3
#>      id   x.x   x.y
#>   <int> <int> <int>
#> 1     1     4    16
#> 2     2     5    17
#> 3     3     6    18

Conceptually I want to write something like:

mutate({{var}} := rowMeans(c({{var}}.x, {{var}}.y), na.rm = T)

but this doesn't work. I'm not sure the best way to approach this question.

John-Henry
  • 1,556
  • 8
  • 20

1 Answers1

1

You can select the columns that contains var in it and take rowMeans.

library(dplyr)

join_then_average <- function(df1, df2, var) {
  full_join(df1, df2, by = "id")  %>%
    mutate(x = rowMeans(select(., contains(var))))
}

join_then_average(a, b, 'x')

# A tibble: 3 x 4
#     id   x.x   x.y     x
#  <int> <int> <int> <dbl>
#1     1     4    16    10
#2     2     5    17    11
#3     3     6    18    12
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I have a lot more columns in a,b so `select(., -id)` so feasible, and I won't know the column names either. If there something about the lines of `select(contains({{var}}))` I could do? – John-Henry Nov 04 '20 at 02:54
  • You are not using `var` in your function anywhere. Nor you are passing it in `join_then_average` function. What would be the value of `var` ? – Ronak Shah Nov 04 '20 at 02:56
  • Sorry for being unclear it would be x Here's the solution. In the function: ``` var_str = as_label(enquo(var)) ... mutate(x = rowMeans(select(., contains(var_str)))) ``` – John-Henry Nov 04 '20 at 03:00
  • 1
    Do you really need to pass `var` unquoted? You can also use `deparse` + `substitute` to convert var to string. – Ronak Shah Nov 04 '20 at 03:05
  • 1
    @John-Henry You can do `df %>% select({{ var }})` and then that arguments supports all the tidyselect features. If you have a string, you'd do `var = all_of(my_string)`. But you can also refer to unquoted names like in tidyselect, or use selection helpers. – Lionel Henry Nov 04 '20 at 08:35
  • @RonakShah Do you think it makes sense to update your reply along the lines above? It would make it more general. – Lionel Henry Nov 04 '20 at 08:37
  • @LionelHenry But `var` is a pattern and not an actual column name. John wants to pass `x` as `var` and it should select columns which has `x` in it i.e `x.x` and `y.x`. – Ronak Shah Nov 04 '20 at 08:41
  • Sorry I jumped too quickly to your reply, my bad. I agree it's better to take a string in that case, so your proposal makes sense to me. – Lionel Henry Nov 04 '20 at 10:34