Creating dplyr function with passing column argument

Question

I am trying to pass column name as argument to a function which uses dplyr functions within.

There were multiple questions already asked around this theme and I tried all of them, everything seems to throw some or the other error.

I used enquo with !! as given here. Tried using !! as_label to combat the error I got from the previous step using this. Also tried to use group_by_ instead of group_by as mentioned here. I have also tried the curly operator for resolution

userMaster <- structure(list(user_id = c(1, 2, 3, 4, 5), city = structure(c(5L, 
5L, 8L, 9L, 10L), .Label = c("Austin", "Boise", "Boston", "Chicago", 
"Dallas", "Denver", "Detroit", "Kansas City", "Las Vegas", "Los Angeles", 
"Manhattan", "Miami", "Minneapolis", "New York City", "Oklahoma City", 
"Omaha", "Phoenix", "Saint Louis", "San Francisco", "Washington DC"
), class = "factor"), source = structure(c(2L, 2L, 2L, 2L, 2L
), .Label = c("Adwords", "Organic", "Search Ads"), class = "factor")), row.names = c(NA, 
5L), class = "data.frame")

userCount <- function(table, metric){
  col_enquo <- enquo(metric)

  summary <- table %>% select(!! (col_enquo), source, user_id) %>%
    group_by_(!! (col_enquo), source) %>% summarise(users = n_distinct(user_id)) %>% 
    left_join(table %>% group_by(source) %>% 
                summarise(total = n_distinct(user_id))) %>% mutate(users/total)
  return(summary)
}

genderDemo <- userCount(userMaster, city)

I get every type of error -

Error: `quos(desire)` must evaluate to column positions or names, not a list

Error in !as_label(col_enquo) : invalid argument type 

Error: Quosures can only be unquoted within a quasiquotation context.

  # Bad:
  list(!!myquosure)

  # Good:
  dplyr::mutate(data, !!myquosure)

Related: https://stackoverflow.com/questions/57013409/how-to-summarize-with-multiple-outputs-a-column-in-a-dataset-by-groups-efficie/57013655#57013655 — acylam, Jul 19 '19 at 15:57

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

4

with rlang_0.4.0, we can use {{...}} (curly-curly operator) which can do the evaluation simpler

library(rlang) #v 0.4.0
library(dplyr) #v 0.8.3
userCount <- function(tbl, metric){
  
  tbl %>% 
       select({{metric}}, source, user_id) %>%
       group_by({{metric}}, source) %>% 
       summarise(users = n_distinct(user_id)) %>% 
       left_join(tbl %>% 
                group_by(source) %>% 
                summarise(total = n_distinct(user_id))) %>% 
                 mutate(users/total)

   }

genderDemo <- userCount(userMaster, desire)
genderDemo
# A tibble: 12 x 5
# Groups:   desire [4]
#   desire source users total `users/total`
#   <fct>  <fct>  <int> <int>         <dbl>
# 1 A      a          2     4         0.5  
# 2 A      b          1     3         0.333
# 3 A      c          2     5         0.4  
# 4 B      a          1     4         0.25 
# 5 B      b          1     3         0.333
# 6 B      c          1     5         0.2  
# 7 C      a          1     4         0.25 
# 8 C      b          2     3         0.667
# 9 C      c          1     5         0.2  
#10 D      a          1     4         0.25 
#11 D      b          1     3         0.333
#12 D      c          2     5         0.4

Using OP's data

userCount(userMaster2, city)
#Joining, by = "source"
# A tibble: 4 x 5
# Groups:   city [4]
#  city        source  users total `users/total`
#  <fct>       <fct>   <int> <int>         <dbl>
#1 Dallas      Organic     2     5           0.4
#2 Kansas City Organic     1     5           0.2
#3 Las Vegas   Organic     1     5           0.2
#4 Los Angeles Organic     1     5           0.2

NOTE: - suffix method is getting deprecated. So, either use {{..}} in group_by or group_by(!! enquo(col_enquo))

data

set.seed(24)
userMaster <- data.frame(desire = rep(LETTERS[1:4], each = 5),
                        user_id = sample(1:5, 20, replace = TRUE),
                        source = sample(letters[1:3], 20, replace = TRUE))

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 19 '19 at 15:49

akrun

874,273
37
540
662

@Krithi07 Can you also update with a small reproducible example using `dput`. In that way, I can test – akrun Jul 19 '19 at 15:53
Error in .f(.x[[i]], ...) : object 'desire' not found – Krithi07 Jul 19 '19 at 15:55
This was the error I had gotten when I used the curl braces – Krithi07 Jul 19 '19 at 15:55
@Krithi07 Could you please update your question with an example or do you want me to guess – akrun Jul 19 '19 at 15:55
Not sure I can dput the data here. It's client's data :( – Krithi07 Jul 19 '19 at 16:03
@Krithi07 I don't need your client's data. Just some data that mimics the structure – akrun Jul 19 '19 at 16:05
@Krithi07 Can you check the updated posst with the output. In your example, theere is no `desire` column, so I am using `city` – akrun Jul 19 '19 at 16:13
I'm getting the same error I mentioned earlier, Error in .f(.x[[i]], ...) : object 'desire' not found – Krithi07 Jul 19 '19 at 16:17
@Krithi07 Are you using the updated function `select({{metric}}, source, user_id)` earlier without testing i had `select(metric, source, user_id)` – akrun Jul 19 '19 at 16:19
@Krithi07 I noticed that your object name is same `userMaster` as the data I used. I changed your data to a diffeerent name. If you noticed that – akrun Jul 19 '19 at 16:20
I even tried using this function using the data you created in the post. Still giving me the same error – Krithi07 Jul 19 '19 at 16:20
@Krithi07 Okay, then I can't replicate your issue. I am using `rlang 0.4.0` and `dplyr_0.8.3` – akrun Jul 19 '19 at 16:21
1

Version was the problem. I upgraded and everything's working fine now. Thanks – Krithi07 Jul 19 '19 at 16:41

Creating dplyr function with passing column argument

1 Answers1

data