0

I have two functions: date_diff and group_stat. So I have read this article tidyverse and I try so create simple functions and use the pipe.

The first function creates a difftime and names them timex_minus_timey but when I pipe this result into the next function I have to look at the name so I can fill in summary_var. Is there a better way to do this?

library(tidyverse)
# 
set.seed(42)
data <- dplyr::bind_rows(
  tibble::tibble(Hosp = rep("A", 1000),
                 drg = sample(letters[1:5], 1000, replace = TRUE),
                 time1 = as.POSIXlt("2018-02-03 08:00:00", tz = "UTC") + rnorm(1000, 0, 60*60*60),
                 time2 = time1 + runif(1000, min = 10*60, max = 20*60)),

  tibble::tibble(Hosp = rep("B", 1000),
                 drg = sample(letters[1:5], 1000, replace = TRUE),
                 time1 = as.POSIXlt("2018-02-03 08:00:00", tz = "UTC") + rnorm(1000, 0, 60*60*60),
                 time2 = time1 + runif(1000, min = 10*60, max = 20*60))
)



date_diff <- function(df, stamp1, stamp2, units = "mins"){

  stamp1 <- rlang::enquo(stamp1)
  stamp2 <- rlang::enquo(stamp2)

  name <- paste0(rlang::quo_name(stamp1), "_minus_", rlang::quo_name(stamp2))

  out <- df %>%
    dplyr::mutate(!!name := as.numeric(difftime(!!stamp1, !!stamp2, units=units)))

  out
}


group_stat <- function(df, group_var, summary_var, .f) {

  func <- rlang::as_function(.f)

  group_var <-  rlang::enquo(group_var)
  summary_var <-rlang::enquo(summary_var)

  name <- paste0(rlang::quo_name(summary_var), "_", deparse(substitute(.f)))

  df %>%
    dplyr::group_by(!!group_var) %>%
    dplyr::summarise(!!name := func(!!summary_var, na.rm = TRUE))
}


data %>% 
  date_diff(time2, time1) %>%  
  group_stat(Hosp, summary_var = time2_minus_time1, mean)
#> # A tibble: 2 x 2
#>   Hosp  time2_minus_time1_mean
#>   <chr>                  <dbl>
#> 1 A                       15.1
#> 2 B                       14.9

Created on 2019-05-02 by the reprex package (v0.2.1)

xhr489
  • 1,957
  • 13
  • 39
  • 1
    It's a little unclear what you mean by "better". Since `group_stat` can be used independently of `date_diff`, it needs to known which column to summarize. The alternative is to introduce an assumption into `group_stat`, so that it expects a particular column name. Then you can drop `summary_var`, since it can be inferred automatically. In general, I think more details are needed on what you are trying to accomplish to be able to provide an answer effectively. – Artem Sokolov May 02 '19 at 19:51
  • @ArtemSokolov: No I don't want to hardcode the names. So maybe there is not any question... – xhr489 May 02 '19 at 19:55

1 Answers1

1

If you intend to always use these functions one after another in this way you could add an attribute containing the new column's name with date_diff, and have group_stat use that attribute. With the if condition, the attribute is only used if it exists and the summary_var argument is not provided.

date_diff <- function(df, stamp1, stamp2, units = "mins"){

  stamp1 <- rlang::enquo(stamp1)
  stamp2 <- rlang::enquo(stamp2)

  name <- paste0(rlang::quo_name(stamp1), "_minus_", rlang::quo_name(stamp2))

  out <- df %>%
    dplyr::mutate(!!name := as.numeric(difftime(!!stamp1, !!stamp2, units=units)))

  attr(out, 'date_diff_nm') <- name
  out
}


group_stat <- function(df, group_var, summary_var, .f) {
  if(!is.null(attr(df, 'date_diff_nm')) & missing(summary_var))
      summary_var <- attr(df, 'date_diff_nm')

  group_var <-  rlang::enquo(group_var)
  name <- paste0(summary_var, "_", deparse(substitute(.f)))

  df %>%
    dplyr::group_by(!!group_var) %>% 
    dplyr::summarise_at(summary_var, funs(!!name := .f), na.rm = T)
}


data %>% 
  date_diff(time2, time1) %>% 
  group_stat(Hosp, .f = mean)

# # A tibble: 2 x 2
#   Hosp  time2_minus_time1_mean
#   <chr>                  <dbl>
# 1 A                       15.1
# 2 B                       14.9
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • great thanks! And then I should ad an else statement for the if I don't want to use group_stat with date_diff right? – xhr489 May 02 '19 at 22:23
  • that is the else part should still have the rlang::as_function right? – xhr489 May 02 '19 at 22:25
  • 1
    If you aren’t using group stat after date_diff and you provide the summary_var as a character then no change is needed. If you want to just type the name without quotes then yeah, you would need an else to get the quo_name of summary_var. You don’t need to do anything with the function. – IceCreamToucan May 02 '19 at 22:26
  • So there is no need for as_function?? And I can just use .f ? – xhr489 May 02 '19 at 22:29
  • 1
    Yes because you’re already passing the actual function as an argument. as_formula converts strings and formulas to functions, so if you had written it as a formula (with ~ and .) you would need as_function. – IceCreamToucan May 02 '19 at 22:31