6

I want the count and proportion (of all of elements) of each group in a data frame (after filtering). This code produces the desired output:

library(dplyr)
df <- data_frame(id = sample(letters[1:3], 100, replace = TRUE),
                 value = rnorm(100))

summary <- filter(df, value > 0) %>%
    group_by(id) %>%
    summarize(count = n()) %>%
    ungroup() %>%
    mutate(proportion = count / sum(count))

> summary
# A tibble: 3 x 3
     id count proportion
  <chr> <int>      <dbl>
1     a    17  0.3695652
2     b    13  0.2826087
3     c    16  0.3478261

Is there an elegant solution to avoid the ungroup() and second summarize() steps. Something like:

summary <- filter(df, value > 0) %>%
    group_by(id) %>%
    summarize(count = n(),
              proportion = n() / [?TOTAL_ROWS()?])

I couldn't find such a function in the documentation, but I must be missing something obvious. Thanks!

Fridolin Linder
  • 401
  • 6
  • 12

1 Answers1

11

You can use nrow on . which refers to the entire data frame piped in:

df %>% 
    filter(value > 0) %>% 
    group_by(id) %>% 
    summarise(count = n(), proportion = count / nrow(.))

# A tibble: 3 x 3
#     id count proportion
#  <chr> <int>      <dbl>
#1     a    14  0.2592593
#2     b    22  0.4074074
#3     c    18  0.3333333
Psidom
  • 209,562
  • 33
  • 339
  • 356