3

I have a data frame containing data that looks something like this:

df <- data.frame(
    group1 = c("High","High","High","Low","Low","Low"),
    group2 = c("male","female","male","female","male","female"),
    one = c("yes","yes","yes","yes","no","no"), 
    two = c("no","yes","no","yes","yes","yes"), 
    three = c("yes","no","no","no","yes","yes")
)

I want to summarise the counts of yes/no in the variables one, two, and three which normally I would do by df %>% group_by(group1,group2,one) %>% summarise(n()). Is there any way that I can summarise all three columns and then bind them all into one output df without having to manually perform the code over each column? I've tried using for loop but I can't get the group_by() to recognize the colname I am giving it as input

Jeff238
  • 396
  • 2
  • 15

3 Answers3

4

Get the data in long format and count :

library(dplyr)
library(tidyr)

df %>% pivot_longer(cols = one:three) %>% count(group1, group2, value)

#  group1 group2 value     n
#  <chr>  <chr>  <chr> <int>
#1 High   female no        1
#2 High   female yes       2
#3 High   male   no        3
#4 High   male   yes       3
#5 Low    female no        2
#6 Low    female yes       4
#7 Low    male   no        1
#8 Low    male   yes       2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

This may be done in dplyr only (no need to use tidyr::pivot_*), though giving slightly different output format. (This one is working even without rowwise though I am not aware of exact reason of it)

df <- data.frame(
  group1 = c("High","High","High","Low","Low","Low"),
  group2 = c("male","female","male","female","male","female"),
  one = c("yes","yes","yes","yes","no","no"), 
  two = c("no","yes","no","yes","yes","yes"), 
  three = c("yes","no","no","no","yes","yes")
)
library(dplyr)

df %>%
  group_by(group1, group2) %>%
  summarise(yes_count = sum(c_across(everything()) == 'yes'),
            no_count = sum(c_across(one:three) == 'no'), .groups = 'drop')
#> # A tibble: 4 x 4
#>   group1 group2 yes_count no_count
#>   <chr>  <chr>      <int>    <int>
#> 1 High   female         2        1
#> 2 High   male           3        3
#> 3 Low    female         4        2
#> 4 Low    male           2        1

Created on 2021-05-12 by the reprex package (v2.0.0)

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • The reason is that `==` is converting it to a logical vector. Check `df %>% group_by(group1, group2) %>% summarise(new = list(c_across(everything()) == "yes")) %>% pull(new)` – akrun May 12 '21 at 23:08
  • i.e. when you are doing `c_across`, it returns a `vector` `df %>% group_by(group1, group2) %>% summarise(new = list(c_across(everything()))) -> out` Now check `out` and `df` and `out$new`. With `rowwise`, there is a constraint that it is grouped by row. But, here that constraint is not there. So, it unlist in the usual column wise for each group – akrun May 12 '21 at 23:12
  • 1
    Also you could use `table` with `unnest_wider` `df %>% group_by(group1, group2) %>% summarise(count = list(table(c_across(everything()))), .groups = 'drop') %>% unnest_wider(count)` – akrun May 12 '21 at 23:20
  • 1
    Thanks @akrun f pi r explanation. Got it. – AnilGoyal May 13 '21 at 01:52
1

Using data.table

library(data.table)
melt(setDT(df), id.var = c('group1', 'group2'))[, .(n = .N),
    .(group1, group2, value)]

-output

    group1 group2 value n
1:   High   male   yes 3
2:   High female   yes 2
3:    Low female   yes 4
4:    Low   male    no 1
5:    Low female    no 2
6:   High   male    no 3
7:    Low   male   yes 2
8:   High female    no 1

With base R, we can use by and table

by(df[3:5], df[1:2], function(x) table(unlist(x)))
akrun
  • 874,273
  • 37
  • 540
  • 662