Here is my problem - I would like to generate a fairly large number of factorial combinations and then apply some constraints on them to narrow down the list of all possible combinations. However, this becomes an issue when the number of all possible combinations becomes extremely large. Let's take an example - Assume we have 8 variables (A; B; C; etc.) each taking 3 levels/values (A={1,2,3}; B={1,2,3}; etc.). The list of all possible combinations would be 3**8 (=6561) and can be generated as following:
tic <- function(){start.time <<- Sys.time()}
toc <- function(){round(Sys.time() - start.time, 4)}
nX = 8
tic()
lk = as.list(NULL)
lk = lapply(1:nX, function(x) c(1,2,3))
toc()
tic()
mapx = expand.grid(lk)
mapx$idx = 1:nrow(mapx)
toc()
So far so good, these operations are done pretty quickly (< 1 second) even if we significantly increase the number of variables.
The next step is to generate a corrected set of all pairwise comparisons (An uncorrected set would be obtain by freely combining all 6561 options with each other, leading to 65616561=43046721 combinations) - The size of this "universe" would be: 6561(6561-1)/2 = 21520080. Already pretty big!
I am using the R built-in function combn to get it done. In this example the running time remains acceptable (about 20 seconds on my PC) but things become impossible with higher higher number of variables and/or more levels per variable (running time would increase exponentially, for example it already took 177 seconds with 9 variables!). But my biggest concern is actually that the object size would become so large that R can no longer handle it (Memory issue).
tic()
univ = t(combn(mapx$idx,2))
toc()
The next step would be to identify the list of combinations meeting some pre-defined constraints. For instance I would like to sub-select all combinations sharing exactly 3 common elements (ie 3 variables take the same values). Again the running time will be very long (even if a 8 variables) as my approach is to loop over all combinations previously defined.
tic()
vrf = NULL
vrf = sapply(1:nrow(univ), function(x){
j1 = mapx[mapx$idx==univ[x,1],-ncol(mapx)]
j2 = mapx[mapx$idx==univ[x,2],-ncol(mapx)]
cond = ifelse(sum(j1==j2)==3,1,0)
return(cond)})
toc()
tic()
univ = univ[vrf==1,]
toc()
Would you know how to overcome this issue? Any tips/advices would be more than welcome!