0

My point of departure is the whigs data from the ggraph package. It contains an incidence matrix.

Now, for each combination of columns/variables, I'd like to know if all the columns are 1 or not, and create a new column for that combination with a 1 if indeed all the columns are 1 or a 0 if not.

The whigs data is just an example: I'm looking for a vectorized method that can be used regardless of the number of columns/combinations.

Using dplyr, I can use across() in the mutate() function to create multiple new columns, but I can't figure out how to create those columns on the basis of the various combinations of columns.

Also using dplyr, I can use c_across() in the mutate() function, in tandem with the rowwise() function, to create a single new column based on the values in multiple columns.

Maybe these two can be combined somehow?

Alexander
  • 72
  • 7

1 Answers1

0

You could try

library(dplyr)
df <- data.frame(A = rep(0, 4), 
                 B = c(1, 0, 0, 1), 
                 C = c(0, 1, 1, 0), 
                 D = c(0, 1, 1 ,1))
cols  <- 1:ncol(df)

combs  <- unlist(sapply(cols[-1], function(x) {
  asplit(combn(cols, m = x), 2)
}), recursive = FALSE)

lapply(combs, function(x) {
  df <<- df %>% mutate(!!paste0(x, collapse = "/") := as.numeric(rowSums(df[, x]) == length(x))) 
})

We create all combinations of columns by index and apply on each combination a function, that checks if all values of these columns are equal to 1 by checking the row sum. If so, we add a new column named "x/y/z..." where x, y and z are the indices of colums compared that is equal to 1 and else 0. Careful, this is quite expensive when the number of columns grows.

  A B C D 1/2 1/3 1/4 2/3 2/4 3/4 1/2/3 1/2/4 1/3/4 2/3/4 1/2/3/4
1 0 1 0 0   0   0   0   0   0   0     0     0     0     0       0
2 0 0 1 1   0   0   0   0   0   1     0     0     0     0       0
3 0 0 1 1   0   0   0   0   0   1     0     0     0     0       0
4 0 1 0 1   0   0   0   0   1   0     0     0     0     0       0
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98
  • Thanks Martin! For some reason it doesn't return a 1 if both columns contain a 1. And I would like to do it for all combinations of columns, including the combinations of 3 to n columns. How would I go through all combinations without having to loop through combn(cols, m), where m is 2:n? – Alexander Nov 20 '20 at 13:45
  • Ah I misunderstood you. I corrected it. Got a bit uglier :) – Martin Schmelzer Nov 20 '20 at 16:26