2

I have a set of binary variables (with the values of 0 and 1) and I want to create a two-way count table that summarizes the counts of cooccurrence of pairs of variables (i.e., both of them have the value of 1). Here is an example dataset:

mm <- matrix(0, 5, 6)
m <- 2
n <- 2
df <- data.frame(apply(mm, c(1,2), function(x) sample(c(0,1),1)))
colnames(df) <- c("Horror", "Thriller", "Comedy", "Romantic", "Sci.fi", "gender")

In the end, I would like to have the table that counts the cooccurrence of Horror(=1) and gender(=1), Thriller(=1) and gender(=1), Comedy(=1) and gender(=1), Romantic(=1) and gender(=1), and sci.fi(=1) and gender(=1).

cliu
  • 933
  • 6
  • 13
  • 1
    Gender is always 1? If gender is always 1 then you just want to sum the occurrences (=1) for each column. – RobertoT Nov 23 '21 at 18:56
  • Gender can be 0. It might be that I accidently generated all 1s for gender in the example – cliu Nov 23 '21 at 20:15

1 Answers1

3

Something like this?

library(dplyr)
df %>% 
  mutate(across(-gender, ~ifelse(.==1 & gender ==1, 1, 0), .names = "{col}_gender1" )) %>% 
  summarise(across(ends_with("gender1"), sum))
  Horror_gender1 Thriller_gender1 Comedy_gender1 Romantic_gender1 Sci.fi_gender1
1              1                3              2                1              0
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Thank you. This is exactly what I was looking for. I would add t() at the end to make it vertical – cliu Nov 23 '21 at 20:15
  • A follow-up question. Do you know how to also add another column that counts all other variables with value 1 and gender = 0? Preferrably in the same chunk of `dplyr` code – cliu Nov 24 '21 at 01:01