1

I'm trying to write an R function to produce a frequency table so I can standardise formatting etc without typing it out repeatedly. The only problem is that I can't get it to evaluate a grouping variable correctly.

Here is some code to get a mini dataset to reproduce the problem:

 library(tidyverse)
 id <- sample(1:500, 5)
 factors <- sample(1:3, 5, replace = TRUE)
 data <- data.frame(id, factors)
 freqTable <- function(x, field){

     Table <- x %>%
         group_by(field) %>%
         summarise(N = n(), Percent = n()/NROW(x)*100) %>%
         mutate(C.Percent = cumsum(Percent))
     return(Table)
 }
 freqTable(data, "factors")

Which results in:

Error in resolve_vars(new_groups, tbl_vars(.data)) : unknown variable to group by : field Called from: resolve_vars(new_groups, tbl_vars(.data))

I've also tried:

freqTable <- function(x, field){
     Table <- x %>%
            group_by(paste(field)) %>%
            summarise(N = n(), Percent = n()/NROW(x)*100) %>%
            mutate(C.Percent = cumsum(Percent))
  return(Table)
}

Which works a little better (in that it doesn't error), but still doesn't actually group the factors correctly, outputting this:

# A tibble: 1 × 4
  `paste(field)`     N Percent C.Percent
           <chr> <int>   <dbl>     <dbl>
1        factors     5     100       100

Where it just tells me the number of cases in that column. Does anyone know where I'm going wrong here?

Cath
  • 23,906
  • 5
  • 52
  • 86
Nick
  • 799
  • 1
  • 7
  • 18
  • 1
    check out the "programming with dplyr vignette": https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html – fmic_ Jul 03 '17 at 13:55

1 Answers1

1

Sorry - just figured this one out.

group_by_(field)

I thought it might have something to do with non-standard evaluation, but I'm not too knowledgeable about it yet.

This:

freqTable <- function(x, field){
+      Table <- x %>%
+             group_by_(paste(field)) %>%
+             summarise(N = n(), Percent = n()/NROW(x)*100) %>%
+             mutate(C.Percent = cumsum(Percent))
+   return(Table)
+ }

Now gives this:

> freqTable(data, "factors")
# A tibble: 2 × 4
  factors     N Percent C.Percent
    <int> <int>   <dbl>     <dbl>
1       2     2      40        40
2       3     3      60       100
Nick
  • 799
  • 1
  • 7
  • 18
  • How did you find out that it would work with `paste` ? with `as.character` it doesn't – moodymudskipper Jul 03 '17 at 14:08
  • 1
    Well I suspected from the error message that it was evaluating the variable name as the value, and thought that maybe if I used paste it would output the value into the space I needed it. – Nick Jul 03 '17 at 15:05
  • that's a cool trick, I wonder if it works in other functions like aggregate, that also take static names as input. I was preparing an answer where I was renaming the column before then after... If someone can explain what's going on I'll be glad! – moodymudskipper Jul 03 '17 at 15:08
  • 1
    Actually it seems to all come from you going from group_by to group_by_, paste is not needed here – moodymudskipper Jul 03 '17 at 15:23
  • From the little reading that I've done on Non-Standard Evaluation most functions seem to have a Non-Standard (NSE) and a Standard evaluation (SE) version. The normal (without the underscore) function name being the NSE version and the underscore version being the Standard Evaluation version. From what I understand the NSE takes the variable name, whereas the SE version takes the value. – Nick Jul 03 '17 at 18:28