0

Consider this very simple example

library(dplyr)
library(broom)

dataframe <- data_frame(id = c(1,2,3,4,5,6),
                        group = c(1,1,1,2,2,2),
                        value = c(200,400,120,300,100,100))

# A tibble: 6 x 3
     id group value
  <dbl> <dbl> <dbl>
1     1     1   200
2     2     1   400
3     3     1   120
4     4     2   300
5     5     2   100
6     6     2   100

Here I want to write a function that outputs the upper bound of the confidence estimate for the mean of value. That is,

get_ci_high <- function(data, myvar){
  confint_tidy(lm(data = data, myvar ~ 1)) %>% pull(conf.high)
}

Now, this works easily

confint_tidy(lm(data = dataframe, value ~ 1)) %>% pull(conf.high)
[1] 332.9999

This works as well (note the call after a group_by)

dataframe %>% group_by(group) %>% mutate(dealwithit = get_ci_high(., value))
# A tibble: 6 x 4
# Groups:   group [2]
     id group value dealwithit
  <dbl> <dbl> <dbl>      <dbl>
1     1     1   200   598.2674
2     2     1   400   598.2674
3     3     1   120   598.2674
4     4     2   300   453.5102
5     5     2   100   453.5102
6     6     2   100   453.5102

This works wonderfully

mindblow <- function(data, groupvar, outputvar){
  quo_groupvar <- enquo(groupvar)
  quo_outputvar <- enquo(outputvar)

  data %>% group_by(!!quo_groupvar) %>% 
    summarize(output =  get_ci_high(., !!quo_outputvar))%>% 
    ungroup()

}

> mindblow(dataframe, groupvar = group, outputvar = value)
# A tibble: 2 x 2
  group   output
  <dbl>    <dbl>
1     1 598.2674
2     2 453.5102

... but this FAILS

get_ci_high(dataframe, value)
 Error in eval(expr, envir, enclos) : object 'value' not found 

I dont get what is wrong here. I really need a solution that works in the four cases above.

Any ideas? Many thanks!!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235

1 Answers1

2

The reason is that when you pass the value argument, you want R to use its name "value" in the formula, rather than the value of the variable (which doesn't exist).

One solution would be to extract the name using substitute() (non-standard evaluation), and create a formula using as.formula:

get_ci_high <- function(data, myvar) {
  col_name <- as.character(substitute(myvar))
  fmla <- as.formula(paste(col_name, "~ 1"))

  confint_tidy(lm(data = data, fmla)) %>% pull(conf.high)
}

get_ci_high(dataframe, value)

However, I'd strongly recommend passing the formula value ~ 1 as the second argument instead. This is both simpler and more flexible for performing other linear models (when you have predictors as well).

get_ci_high <- function(data, fmla) {      
  confint_tidy(lm(data = data, fmla)) %>% pull(conf.high)
}

get_ci_high(dataframe, value ~ 1)
David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • super nice, thanks David. Just to be sure, a formula is something R interpret differently than a string or a variable, right? this is why it is working here? – ℕʘʘḆḽḘ Aug 24 '17 at 13:18
  • 1
    @ᴺᴼᴼᴮᴵᴱ Yes exactly; R keeps a formula intact, and can later pass it to other functions, without needing the variables within it to exist immediately (e.g. try out `a ~ b`). – David Robinson Aug 24 '17 at 13:20
  • argh... Problem is, your nice first solution does not work in a more general setting. Consider this function `mindblow <- function(data, groupvar, outputvar){ quo_groupvar <- enquo(groupvar) quo_outputvar <- enquo(outputvar) df_agg <- data %>% group_by( !!quo_groupvar) %>% summarize(output = get_ci_high(., !!quo_outputvar)) }` Do you see an easy fix here? – ℕʘʘḆḽḘ Aug 24 '17 at 13:38
  • question edited for clarity. I think an easy fix is to write two versions of the same function. but that looks sooooo inefficient – ℕʘʘḆḽḘ Aug 24 '17 at 13:45
  • OK, made it to work with some crazy R magic. use`get_ci_high(.,!!rlang::get_expr(quo_outputvar)))` in the `mindblow` function. I am not sure WHY this works tho :) – ℕʘʘḆḽḘ Aug 24 '17 at 14:07
  • HA! actually it does not work because the regressions are not computed by groups... I think this is worth another question – ℕʘʘḆḽḘ Aug 24 '17 at 14:34