I am implementing a parallel processing system which will eventually be deployed on a cluster, but I'm having trouble working out how the various methods of parallel processing interact.
I need to use a for loop to run a big block of code, which contains several large list of matrices operations. To speed this up, I want to parallelise the for loop with a foreach(), and parallelise the list operations with mclapply.
example pseudocode:
cl<-makeCluster(2)
registerDoParallel(cl)
outputs <- foreach(k = 1:2, .packages = "various packages") {
l_output1 <- mclapply(l_input1, function, mc.cores = 2)
l_output2 <- mclapply(l_input2, function, mc.cores = 2)
return = mapply(cbind, l_output1, l_output2, SIMPLIFY=FALSE)
}
This seems to work. My questions are:
1) is it a reasonable approach? They seem to work together on my small scale tests, but it feels a bit kludgy.
2) how many cores/processors will it use at any given time? When I upscale it to a cluster, I will need to understand how much I can push this (the foreach only loops 7 times, but the mclapply lists are up to 70 or so big matrices). It appears to create 6 "cores" as written (presumably 2 for the foreach, and 2 for each mclapply.