5

I created a simple pivot table in the dplyr package in R. Here is my working example:

library(dplyr)
mean_mpg <- mean(mtcars$mpg)

# creating a new variable that shows that Miles/(US) gallon is greater than the mean or not

mtcars <-
mtcars %>%
  mutate(mpg_cat = ifelse(mpg > mean_mpg, 1,0))

mtcars %>%
  group_by(as.factor(cyl)) %>%
  summarise(sum=sum(mpg_cat),total=n()) %>%
  mutate(percentage=sum*100/total)

Now, I want to write a function to reuse this code:

get_pivot <- function(data, predictor,target) {
  result <-
    data %>%
    group_by(as.factor(predictor)) %>%
    summarise(sum=sum(target),total=n()) %>%
    mutate(percentage=sum*100/total);

  print(result)
}

but I receive the following error:

Error in is.factor(x) : object 'cyl' not found

I also tried

get_pivot(mtcars, "cyl", "mpg_cat" )

but it did not work.

What should I do?

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Hamideh
  • 665
  • 2
  • 8
  • 20
  • 2
    dplyr uses quotation in its functions. either implement quoting or do it the normal way without quoting. https://dplyr.tidyverse.org/articles/programming.html#quoting – qwr Jul 05 '19 at 04:29
  • Per qwr's comment, this works for me: get_pivot(mtcars, mtcars$cyl, mtcars$mpg_cat ) – Russ Thomas Jul 05 '19 at 04:31
  • 'dplyr 0.8.1` imports rlang (>= 0.3.4), but the functionality in my answer requires rlang >=0.4.0. Not sure the relevance of the dplyr version in the title. https://github.com/tidyverse/dplyr/blob/v0.8.1/DESCRIPTION – Jon Spring Jul 05 '19 at 05:35

1 Answers1

9

If you have the most recent rlang library update v0.4.0 (June 2019), you can use double curly brackets {{ }} (aka "curly curly") to make programming with dplyr easier.

# Note: needs installation of rlang 0.4.0 or later
get_pivot <- function(data, predictor,target) {
  result <-
    data %>%
    group_by(as.factor( {{ predictor }} )) %>%
    summarise(sum=sum( {{ target }} ),total=n()) %>%
    mutate(percentage=sum*100/total);

  print(result)
}

# Edit -- thank you Rui Barradas
> get_pivot(mtcars, cyl, mpg_cat)
# A tibble: 3 x 4
  `as.factor(cyl)`   sum total percentage
  <fct>            <dbl> <int>      <dbl>
1 4                   11    11      100  
2 6                    3     7       42.9
3 8                    0    14        0  

The reason this is required is that dplyr and other tidyverse packages use "non-standard evaluation" like you encounter with some base R functions, like lm(mpg~factor(am),data=mtcars). This practice often makes "interactive" code shorter, simpler, and easier to read, but at the cost of making programming more complicated. In this case, the {{ }} operator serves to transport the column you specify into the context of the function.

https://www.tidyverse.org/articles/2019/06/rlang-0-4-0/

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • 1
    I don't know for sure but I think the non-standard evaluation actually comes from R's formula and not lm. – qwr Jul 05 '19 at 04:38
  • Also you can pass in data columns directly like `get_pivot(mtcars, mtcars$cyl, mtcars$mpg_cat)` – qwr Jul 05 '19 at 04:49
  • Thanks for this explanation. I keep meaning to sit down and take the time to properly understand `dplyr`'s non-standad eval, now I'm glad I waited until they brought in the new syntax. – Marius Jul 05 '19 at 04:54
  • The function doesn't produce the same results as the non-function version. This is because you are summing `target` not `mpg_cat` (or `target_cat`). – Rui Barradas Jul 05 '19 at 05:20