I'm trying to run parallel cv.glmnet poisson models on a windows machine with 64Gb of RAM. My data is a 20 million row x 200 col sparse matrix, around 10Gb in size. I'm using makecluster and doParallel, and setting parallel = TRUE in cv.glmnet. I currently have two issues getting this setup:
Distributing data to different processes is taking hours, reducing speedup significantly. I know this can be solved using fork on linux machines, but is there any way of reducing this time on windows?
I'm running this for multiple models with data and responses, so the object size is changing each time. How can I work out in advance how many cores I can run before getting an 'out of memory' error? I'm particularly confused at how the data gets distributed. If I run on 4 cores, the first rsession will use 30Gb of memory, while the others will be closer to 10Gb. What does that 30 Gb go towards, and is there any way of reducing it?