1

How can I make a function that takes a column and uses that in dplyr, tidyr and ggplot?

df <- data.frame(date_col = c(1,1,2,2,3,4,4,5,5), 
                 col_a = c('a','b','a','b','a','a','b','a','b'),
                 val_col = runif(9))

How do I write a function takes a parameter param_col instead of the hardcoded col_a

df %>% 
  group_by(date_col, col_a) %>% 
  summarise(val_col = sum(val_col)) %>% 
  complete(col_a, date_col) %>% 
  ggplot(aes(date_col, val_col, color = col_a)) + 
  geom_line() 

The dplyr and ggplot calls work in the code outlined below. But how should the complete call be written? Or should complete_ be used?

Is there a more canonical way of doing this?

plot_nice_chart <- function(df, param_col) {

  enq_param_col <- enquo(param_col)
  str_param_col <- deparse(substitute(param_col))


  # aggregate data based on group_by_col, 
  # explicitly fill in NA's for missing to avoid interpolation
  df %>% 
     group_by(!!enq_param_col, date_col) %>%
     summarise(val_col = sum(val_col)) %>%
     complete(<what-should-be-here?>, date_col) %>%
     ggplot(aes_string("date_col", "val_col", color = str_param_col)) +
        geom_line()
}
Rickard
  • 3,600
  • 2
  • 19
  • 22
  • can you give some example data? – Roman Jul 05 '17 at 14:51
  • It doesn't look like you are passing any function arguments to `complete`, so it seems like things should work as-is. Is the function not working? – aosmith Jul 05 '17 at 15:24
  • what should be replaced by to complete any missing levels in the combination of group_by_col and date_col – Rickard Jul 05 '17 at 15:31

1 Answers1

1

The development version of tidyr, tidyr_0.6.3.9000, now uses tidyeval, so if you want to update to that you could use !! as you did in group_by.

plot_nice_chart <- function(df, param_col) {

     enq_param_col <- enquo(param_col)
     str_param_col <- deparse(substitute(param_col))
     str_param_col
     df %>%
          group_by(!!enq_param_col, date_col) %>%
          summarise(val_col = sum(val_col)) %>%
          ungroup() %>%
          complete(!!enq_param_col, date_col) %>%
          ggplot(aes_string("date_col", "val_col", color = str_param_col)) +
          geom_line()
}

Using the current version, you can use complete_ with variables as strings.

plot_nice_chart <- function(df, param_col) {

     enq_param_col <- enquo(param_col)
     str_param_col <- deparse(substitute(param_col))

     df %>%
          group_by(!!enq_param_col, date_col) %>%
          summarise(val_col = sum(val_col)) %>%
          ungroup() %>%
          complete_( c(str_param_col, "date_col") ) %>%
          ggplot(aes_string("date_col", "val_col", color = str_param_col)) +
          geom_line()
}
aosmith
  • 34,856
  • 9
  • 84
  • 118
  • Thanks. wrapping string arguments in a vector with `c` works. But why? – Rickard Jul 05 '17 at 20:27
  • @Rickard I believe the `cols` argument needs to a character vector much like the `gather_cols` [argument of `gather_`](https://github.com/tidyverse/tidyr/issues/109#issuecomment-168024977) . – aosmith Jul 05 '17 at 20:40
  • If I now want to use this function with another function. How should I proceed? e.g. c("col_a","col_b","col_c") %>% walk(~ plot_nice_chart(df, .) gives me "column `.` is unknown" – Rickard Jul 05 '17 at 21:22
  • It could be related to `deparse(substitute())` or how *purrr* works. Try asking a new question for this new problem. You may find it easier to get the development version of *tidyr* and work within `tidyeval`. – aosmith Jul 05 '17 at 21:27
  • @Rickard Also, if you've switched to working with strings as inputs you'll need `rlang::sym` for *dplyr* but can work directly with the string for `aes_string` and `complete` (i.e., you don't need `deparse(substitute() )` ). – aosmith Jul 05 '17 at 22:03