Summarizing grouped and ungrouped data in a single tibble with tidyverse in R

Question

To summarize each variable in my data, I typically create two tables: one that groups by experimental condition and a second that displays aggregated statistics across all experimental groups.

However, I'd like to display both grouped and aggregated descriptives in a single table.

I've accomplished this as shown below in the demo code — however, I'd like to find a more elegant solution, ideally using a single chain.

Does anyone know how I might accomplish this?

Thank for your help.


# Data
df <- tibble(Condition = rep(c("Group1",
                           "Group2",
                           "Group3",
                           "Group4"),20),
             comp_fail = rbinom(n = 80,
                                size = 1,
                                prob = .1)
)

# Converting Condition to factor to mimic my actual df
df$Condition <- as.factor(df$Condition)

# Descriptives grouped by condition
tbl_comp_fail_condition <- 
 df %>% 
  group_by(Condition) %>% 
  summarize("# with Zero Fails" = sum(comp_fail == 0, na.rm = T),
            "# with One Fail" = sum(comp_fail == 1, na.rm = T))

# Descriptives aggregated
tbl_comp_fail_aggregate <- 
 df %>% 
  summarize("# with Zero Fails" = sum(comp_fail == 0, na.rm = T),
            "# with One Fail" = sum(comp_fail == 1, na.rm = T))

# Joining grouped and aggregated
tbl_comp_fail_combined <- full_join(
 tbl_comp_fail_aggregate,
 tbl_comp_fail_condition) %>% 
  select(Condition, everything())

# Converting Condition to character to replace NA with All
tbl_comp_fail_combined$Condition <- as.character(tbl_comp_fail_combined$Condition)

# Replace NA with All
tbl_comp_fail_combined %>% replace_na(list(Condition = "All")) ``

score 3 · Answer 1 · answered Jun 18 '21 at 02:35

Here's an approach using dplyr::bind_rows:

df %>%
  group_by(Condition) %>%
  summarize("# with Zero Fails" = sum(comp_fail == 0, na.rm = T),
            "# with One Fail" = sum(comp_fail == 1, na.rm = T)) %>%
  bind_rows(list(summarise(., across(-Condition, sum))))

# A tibble: 5 x 3
  Condition `# with Zero Fails` `# with One Fail`
  <fct>                   <int>             <int>
1 Group1                     18                 2
2 Group2                     18                 2
3 Group3                     17                 3
4 Group4                     18                 2
5 NA                         71                 9

Or to get the summary first:

df %>%
  group_by(Condition) %>%
  summarize("# with Zero Fails" = sum(comp_fail == 0, na.rm = T),
            "# with One Fail" = sum(comp_fail == 1, na.rm = T)) %>%
  bind_rows(list(summarise(., across(-Condition, sum))), .) %>%
  relocate(Condition)

# A tibble: 5 x 3
  Condition `# with Zero Fails` `# with One Fail`
  <fct>                   <int>             <int>
1 NA                         71                 9
2 Group1                     18                 2
3 Group2                     18                 2
4 Group3                     17                 3
5 Group4                     18                 2

akrun · Answer 2 · 2021-06-18T03:11:25.290

Using adorn_totals would make this more easier

library(dplyr)
library(janitor)
df %>% 
    group_by(Condition) %>% 
    summarise("# with Zero Fails" = sum(!comp_fail, na.rm = TRUE), 
            "# with One Fail" = sum(comp_fail, na.rm = TRUE)) %>% 
    adorn_totals(name = "All")

-output

   Condition # with Zero Fails # with One Fail
    Group1                17               3
    Group2                14               6
    Group3                18               2
    Group4                14               6
       All                63              17

OP's output

out
# A tibble: 5 x 3
  Condition `# with Zero Fails` `# with One Fail`
  <chr>                   <int>             <int>
1 All                        63                17
2 Group1                     17                 3
3 Group2                     14                 6
4 Group3                     18                 2
5 Group4                     14                 6

Or another option is count with pivot_wider (which would be more general in case there are more than 2 groups)

library(tidyr)
df %>%
     count(Condition, comp_fail) %>% 
     pivot_wider(names_from = comp_fail, values_from = n) %>% 
     adorn_totals(name = 'All')

Or this can be done in base R with table/add_margins

addmargins(table(df), 1)
         comp_fail
Condition  0  1
   Group1 17  3
   Group2 14  6
   Group3 18  2
   Group4 14  6
   Sum    63 17

Summarizing grouped and ungrouped data in a single tibble with tidyverse in R

2 Answers2