1

Lets assume I have a binary matrix with 24 columns and 5000 rows. The columns are Parameters (P1 - P24) of 5000 subjects. The parameters are binary (0 or 1). (Note: my real data can contain as much as 40,000 subjects)

m <- matrix(, nrow = 5000, ncol = 24)
m <- apply(m, c(1,2), function(x) sample(c(0,1),1))
colnames(m) <- paste("P", c(1:24), sep = "")

Now I would like to determine what are all possible combinations of the 24 measured parameters:

comb <- expand.grid(rep(list(0:1), 24))
colnames(comb) <- paste("P", c(1:24), sep = "")

The final question is: How often does each of the possible row combinations from comb appear in matrix m? I managed to write a code for this and create a new column in comb to add the counts. But my code appears to be really slow and would take 328 days to complete to run. Therefore the code below only considers the 20 first combinations

comb$count <- 0
for (k in 1:20){    # considers only the first 20 combinations of comb
  for (i in 1:nrow(m)){
    if (all(m[i,] == comb[k,1:24])){
      comb$count[k] <- comb$count[k] + 1
    }  
  }
}

Is there computationally a more efficient way to compute this above so I can count all combinations in a short time? Thank you very much for your help in advance.

AJZ203
  • 13
  • 2
  • Looping through 16,777,216 combinations will definitely time to count. Maybe summarize each row of m into a string and then use a table; or group and aggregate with data.table. – M.Viking May 10 '21 at 01:25

3 Answers3

2

Data.Table is fast at this type of operation:

m <- matrix(, nrow = 5000, ncol = 24)
m <- apply(m, c(1,2), function(x) sample(c(0,1),1))
colnames(m) <- paste("P", c(1:24), sep = "")
comb <- expand.grid(rep(list(0:1), 24))
colnames(comb) <- paste("P", c(1:24), sep = "")

library(data.table)
data_t = data.table(m)
ans = data_t[, .N, by = P1:P24]
dim(ans)
head(ans)

The core of the function is by = P1:P24 means group by all the columns; and .N the number of records in group

I used this as inspiration - How does one aggregate and summarize data quickly?

and the data_table manual https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

M.Viking
  • 5,067
  • 4
  • 17
  • 33
0

If all you need is the combinations that occur in the data and how many times, this will do it:

m2 <- apply(m, 1, paste0, collapse="")
m2.tbl <- xtabs(~m2)
head(m2.tbl)
m2
# 000000000001000101010010 000000000010001000100100 000000000010001110001100 000000000100001000010111 000000000100010110101010 000000000100101000101100 
#                        1                        1                        1                        1                        1                        1 
dcarlson
  • 10,936
  • 2
  • 15
  • 18
0

You can use apply to paste the unique values in a row and use table to count the frequency.

table(apply(m, 1, paste0, collapse = '-'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213