1

I am using the foreach package to run my code in parallel. Although it works correctly for "small" data sets, it fails to run for larger data sets. More specifically, it throws this error:

Error in serialize(data, node$con) : error writing to connection

This is my code:

library(foreach)
library(doParallel)
library(iterators)

numCores = detectCores() - 1
clm = makePSOCKcluster(numCores)
registerDoParallel(clm)

results = foreach(i = 1:1000) %dopar% {
    myFunction(i, otherArguments)
}
stopCluster(clm)

While running, I notice in the Task Manager that the Memory almost reaches 100% before the program stops. I tried using various numbers of cores and tried setting the memory.limit(), but not in a systematic way and the problem persisted. What could my problem be and how can I resolve it?

Computer: Windows 10 64bit, 28GB Ram, CPU 8 cores, R3.2.2 (64 bit)

I know something similar was posted here, but the issue was not resolved then.

Community
  • 1
  • 1
  • 1
    In [This thread](http://stackoverflow.com/questions/37750937/doparallel-package-foreach-does-not-work-for-big-iterations-in-r) you will find a memory efficient way using chunking. – 989 Jun 21 '16 at 11:42
  • @m0h3n Thanks for the post! I somehow managed to miss the thread in my search. I tried to implement the solutions with idiv() or idivix() but my code results in error. Is it safe to make global declarations like in those functions? How would you implement my foreach loop using chunking? – GerasimosPanagiotakopoulos Jun 21 '16 at 14:33

0 Answers0