I'm using a foreach loop to try and speed up some data processing I'm doing. I'd upload the full code, but its about 2k lines long so that doesn't seem worthwhile. Basically, I have a bunch of matrices (15 wide and 300 to 1500 long) that I need to pass through Mplus using mclust. I have a for loop which wraps around the foreach loop, which contains the mclust model fitting. Something like this:
registerDoParallel(4)
for (i in 1:10) {
if (i==1) {data=load(file.rda)} #I've broken the data into 10 smaller chunks
if (i==2) ...
out <- foreach (sim=1:length(data), .packages=c('mclust','MplusAutomation')) %dopar% {
#Proceed to fit various models in Mplus and saving the important output to a matrix as
results[1:130]
#this is so the thing that gets reported is the list of results I need and not a singular value.
}
if (i==1) {save(out, "out.file.rda")}
if (i==2) ...
}
Anyway, I know the code works on smaller data batches (for instance, if I tell it to run only on the first ten in each of the datasets, it can run clean through without issue. However, when I ramp this up to running on the full dataset, I get errors like this:
Error in { : task 175 failed - "cannot open the connection"
It seems to happen at different points during the script, not always at the same time/place. I've tried messing with how many cores it uses (4-6), how much data it loads in at any one time (all 6.6 GB at once to 1/10th of that), I've increased the working memory (memory.limit(size=56000)), but none of these changes have allowed the code to run without error. In fact, it's never managed to complete one of the i loops through yet.
Any suggestions?