Adding dataset and column arguments into a function from a named vector -rlang question

Question

I have a series of functions that make some ggplot2 charts.

I have a new dataset that I want to use these functions on, to make the charts.

This new dataset has its own unique names for the columns that the functions needs.

It is also likely that I will get additional new datasets (with their own different column names) in the future).

I was thinking of making a named vector where I specified the new dataset's column names to utilise (and also the name of the new dataset object itself), and I could give the values of this named vector to each of the functions.

Here is a minimally reproducible example for what I am talking about.

I know it is going to involve some combination of !!, enquo, sym... but I've tried heaps and it looks like it's got me beat.

Also, I would like to do this without altering the functions (i.e. I would still like to utilise the functions by entering in the dataset / column object names directly, as well).

library(tidyverse)
library(rlang)

# make a dataset
dif_data_name <- tibble(dif_col_name = 1:50)


# a function that only utilises a dataset
test_function_only_data <- function(dataset) {

  dataset %>% 
    pull() %>% 
    sum()
}

# a function that utilises the dataset and a specific column

test_function_with_col <- function(dataset, only_column) {

  only_column <- enquo(only_column)

  dataset %>% 
    pull(!! only_column) %>%
    sum()
}



# If I specify the datset object, this works
test_function_only_data(dif_data_name)

# so does this (with the column name as well)
test_function_with_col(dif_data_name, dif_col_name)


# But I was hoping to use a named vector for the dataset and column arguments

function_arguments <- c("dataset" = "dif_data_name",
                         "only_column" = "dif_col_name")


# These (below) do not work. But I would like to figure out how to make them work.


# first function test

test_function_only_data(
  function_arguments[["dataset"]]
                        )


# second function test

test_function_with_col(function_arguments[["dataset"]],  
                       function_arguments[["only_column"]])

Why do you need to pass named arguments? What is wrong with your current approach? If you have another dataset in the future with different column name you can change `test_function_with_col(dif_data_name, dif_col_name)` to `test_function_with_col(another_data, another_col)` ? — Ronak Shah, Feb 19 '20 at 05:58
Cheers @RonakShah... yeah, I've got too many functions... can't be bothered to replace all the object and column references individually — Julian Tagell, Feb 19 '20 at 06:44

andrew_reece · Accepted Answer · 2020-02-19T07:56:25.627

1

Update (per OP comments)
Here's a full example using the data posted in the gist in this comment thread.

set.seed(123)

new_table <- tibble(
  Date = seq.Date(as.Date("2016-01-01"), as.Date("2019-12-31"), 1)
  ) %>% 
  mutate(total_sales = rnorm(n()))

new_yearly_lines_fn <- function(sales_table, date_col, money_col) {
  date_col <- sym(date_col)
  money_col <- sym(money_col)
  sales_table <- eval(sym(sales_table))

  sales_table %>%
    group_by(year_month = floor_date({{date_col}}, "months"),
             year = year({{date_col}})) %>%
    summarise(total_sales = sum({{money_col}})) %>%
    ungroup() %>%
    ggplot() +
    aes(year_month, total_sales, col = factor(year)) +
    geom_line(stat = "identity", size = 2) +
    geom_point(stat = "identity", size = 2, col = "black")

}

function_arguments <- c("the_dataset" = "new_table",
                        "the_date_col" = "Date",
                        "the_money_col" = "total_sales")

new_yearly_lines_fn(function_arguments[["the_dataset"]], 
                    function_arguments[["the_date_col"]], 
                    function_arguments[["the_money_col"]])

FWIW, there are simpler ways to pass the information you want into a function with tidy evaluation. But here's how you'd do it with your named vector:

f <- function(named) {
  df_str <- named[["dataset"]]
  col_str <- named[["only_column"]]

  dataset <- eval(sym(df_str))

  dataset %>% 
    pull({{col_str}}) %>%
    sum()
}

f(function_arguments)
# 1275

Variants which pass in individual components of function_arguments will also work:

f2 <- function(df_str, col_str) {
  col <- sym(col_str)
  dataset <- eval(sym(df_str))

  dataset %>% 
    pull({{col_str}}) %>%
    sum()
}

f2(function_arguments[["dataset"]], function_arguments[["only_column"]])
# 1275

Note that !! notation is now replaced by {{ }} notation, as of rlang. 0.4.0.

edited Feb 19 '20 at 07:56

answered Feb 19 '20 at 06:12

andrew_reece

20,390
3
33
58

thanks @andrew_reece the first function works but prevents me from utilising the function in a normal way (i.e without the named vector, just the dataset and column names directly). The second function is closer to what I'm after.. I didn't know about the {{}} replacing the !! but that seems to work, except for that the "normal" dataset object name will no longer work... – Julian Tagell Feb 19 '20 at 06:58
You're welcome. Does my second example provide you with a suitable solution? – andrew_reece Feb 19 '20 at 07:00
mate, I wish :-) went and tried it on one of my actual functions and it didn't work. Here is the gist https://gist.github.com/Tadge-Analytics/c18345cb3cc31342c3f371c945e65708 – Julian Tagell Feb 19 '20 at 07:28
Wrap your column inputs in `sym()`. I just tried it on your gist data, the plot renders correctly. `date_col <- sym(date_col); money_col <- sym(money_col)`. Update `f2()`. I'm not sure why it worked with `pull()` in your example function earlier. – andrew_reece Feb 19 '20 at 07:46
thanks heaps... I wish there was some way this function could be made to run with just regular dataset and columns, as well. Having to make a dynamic function and a regular function seems like doubling up of efforts. – Julian Tagell Feb 19 '20 at 07:56
1

You could just check the `class` of the argument first thing inside the function, and either `sym` or not as needed. – andrew_reece Feb 19 '20 at 15:52
Oh dear, have been trying this every which way... but something feels just slightly amiss... https://gist.github.com/Tadge-Analytics/8fbd26176fc1c998679620f02a3a7ebc – Julian Tagell Feb 19 '20 at 23:58
Look into `tryCatch` or `purrr::safely` if you're having trouble handling errors. But really, the two input types are different enough that it's probably more sanity-preserving to define two different functions. (Also, feel free to open up a new question if you're getting stuck!) – andrew_reece Feb 20 '20 at 01:13
Thanks @andrew_reece have made another question. I'm having doubts about whether it's possible https://stackoverflow.com/questions/60312049/creating-an-r-function-that-can-accept-both-datasets-and-object-names-as-well-as?noredirect=1#comment106687442_60312049 – Julian Tagell Feb 20 '20 at 09:49

Adding dataset and column arguments into a function from a named vector -rlang question

1 Answers1