35

This is my code. The stuff inside the loop makes sense.

        library(foreach)
        library(doParallel)
        cl <- makeCluster(7)
        registerDoParallel(cl) 

        elasticitylist = foreach(i=1:nhousehold) %dopar% {

            pricedraws = out$betadraw[i,12,] 
            elasticitydraws[,,i]= probarray[,,i] %*% diag(pricedraws)
            elasticitydraws[,,i] = elasticitydraws[,,i] * as.vector(medianpricemat)

        } 

I keep getting this error:

Error in serialize(data, node$con) : error writing to connection

I know I have enough cores (there are 20). Can anyone help with this? It seems the answer is nowhere to be found in docs!

When I run ps -ef| grep user on my unix server, I get:

/apps/R.3.1.2/lib64/R/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK() --args MASTER=localhost PORT=11025 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
svick
  • 236,525
  • 50
  • 385
  • 514
wolfsatthedoor
  • 7,163
  • 18
  • 46
  • 90
  • Code essentially identical to yours except for some hand generated data works for me. If you make the example reproducible I can take another look. Are you using unusual data structures? – kasterma Feb 13 '15 at 16:16
  • The data are very big, but they aren't unusual. I think out$betadraw is a matrix slice though, could that be it? – wolfsatthedoor Feb 13 '15 at 16:21

6 Answers6

21

The functions serialize and unserialize are called by the master process to communicate with the workers when using a socket cluster. If you get an error from either of those functions, it usually means that at least one of the workers has died. On a Linux machine, it might have died because the machine was almost out of memory, so the out-of-memory killer decided to kill it, but there are many other possibilities.

I suggest that you use the makeCluster outfile="" option when creating the cluster object so that output from the workers is displayed. If you're lucky, you'll get an error message from a worker before it dies that will help you to solve the problem.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • 5
    My friend said it's because when you try to have the output be too high of memory, it errors typically regrdless of how much RAM you have. Do you know a way around this? – wolfsatthedoor Feb 13 '15 at 22:09
10

I had the same problem, when I tried to use all 8 cores of my machine. When I left one open, then the problem went away. I believe the system requires 1 core for service tasks left open, or else you'll get an error:

library(doParallel)
#Find out how many cores are available (if you don't already know)
cores<-detectCores()
#Create cluster with desired number of cores, leave one open for the machine         
#core processes
cl <- makeCluster(cores[1]-1)
#Register cluster
registerDoParallel(cl)
  • 6
    I don't think it's the sheer number of cores that causes the error; I have noticed that, when I work on large files (4+Gb) and use "doParallel", I need to reduce the number of cores used otherwise I run low on memory. So the problem could be that too many cores use too much memory which throws the error. Anyone with more knowledge can confirm/disprove/explain this? – g_puffo Jul 10 '16 at 18:07
  • 2
    it is likely that `doParallel` copies all of the data from the master to the slaves. thus, the memory demands would scale linearly in the number of cores. the master and slaves all draw from the same memory pool, so it is not hard to breach memory limits. – Kevin L. Keys Nov 08 '16 at 22:29
5

I received a similar error from the following, where I terminated by model training early, and then tried to run it again. Here is an example, I am using the caret package to train a model, but I think it is applicable in any application where parallel processing is involved.

> cluster <- makeCluster(10)
> registerDoParallel(cluster)
> train(... , trControl = trainControl(allowParallel = T)
# Terminated before complete
> train(... , trControl = trainControl(allowParallel = T)
Error in serialize(data, node$con) : error writing to connection

I closed the cluster and reinitiated it:

stopCluster(cluster)
registerDoSEQ()
cluster <- makeCluster(10)
registerDoParallel(cluster)

Did not see the error when running the model again. Sometimes turning it off and back on again really may be the solution.

cacti5
  • 2,006
  • 2
  • 25
  • 33
4

Each core you assign consumes memory. So the more cores means more memory is being demanded and as soon you run out of it, you will receive this error. So my suggestion is to reduce the number of cores for Parallelization.

Having 8 cores myself and 32 GB memory available, I tried using 7 and then 6 cores myself and ran into a similar error. After which I decided to dedicate only 4 cores for which it is consuming around 70% of the memory:-

enter image description here

1 more core probably would have worked.

P.S: don't mind the picture quality.

Abdul Basit Khan
  • 646
  • 1
  • 6
  • 19
3

After received this error message, I changed my code to non-parallel for loop. Then I received error message "cannot allocate vector of size *** Gb". I guess the parallel fails may be caused by the same reason, just different error message.

Feng Jiang
  • 1,776
  • 19
  • 25
0

I've run into similar problems with the mclapply function. I do not know the reason, because the error appears kinda randomly. However I am using this workaround which works perfectly fine for me:

for(....( {
    .
    .
    .
 
    error_count <- 1
    while (error_count <= 3) {
      error <- try(
        FUNCTION THAT USES mclapply
      )
  
      if(class(error) != "try-error") break
  
      error_count <- error_count + 1
      invisible(gc()); Sys.sleep(1)
    }
    if(class(error) == "try-error") next

    .
    .
    .
}

So far, when the error occurs it works for the second while iteration.

Sebastian
  • 23
  • 4