Following my question I use the following code:
dist<-c('att1','att2','att3','att4','att5','att6')
p1<-c('att1','att5','att2')
p2<-c('att5','att1','att4')
p3<-c('att3','att4','att2')
p4<-c('att1','att2','att3')
p5<-c('att6')
....
p32<-c('att35','att34','att32')
In the real case there can be 1024 vectors.
I would like to find all the relevant p
that the unification of them will be the maximal components of dist. I this case the solution would be p1
, p3
, p5
. I want to choose the minimal number of p
. In addition, in case there is no way to cover all the of dist component so I want to choose the maximal cover with minimal number of vectors (p).
N = 32
library(qdapTools)
library(dplyr)
library(data.table)
## generate matrix of attributes
attribute_matrix <- mtabulate(list(p1, p2, p3, p4, p5,...,p32))
library (bigmemory)
## generate matrix of attributes
grid_matrix <- do.call(CJ, rep(list(1:0), N)) %>% as.big.matrix
Error: cannot allocate vector of size 8.0 Gb
I tried an alternative way for it:
grid_matrix <- do.call(CJ, rep(list(1:0), N)) %>% as.data.frame
grid_matrix <- as.matrix (grid_matrix)
And still got the same error.
How can I fix it and use it for big data? I wanted to continue with:
colnames(grid_matrix) <- paste0("p", 1:N)
combin_all_element_present <- rowSums(grid_matrix %*% attribute_matrix > 0) %>% `==`(., ncol(attribute_matrix))
grid_matrix_sub <- grid_matrix[combin_all_element_present, ]
grid_matrix_sub[rowSums(grid_matrix_sub) == min(rowSums(grid_matrix_sub)), ]