I know there are many posts about the issues with memory consumption of mclapply
but still I'm trying to see whether there's anything that can help my case.
I'm fitting a random forest model to a ~600 by 60,000 (response y by variables matrix X) matrix:
library(randomForest)
fit <- randomForest(x=X,y=y)
I then want to compare that fit to a random fit and for that what I'm doing is:
library(parallel)
set.seed(1)
random.list <- mclapply(1:1000,function(f){
idx <- shuffle(nrow(X))
random.y <- predict(object=fit,newdata=X[idx,],type="response")
}, mc.cores = ncores)
Unfortunately this is too memory intensive (requires more than 100GB) which makes it impractical.
BTW the environment I'm running on is Linux.
Any suggestions?