6

I've seen a few other posts on this topic, and none seemed to be quite the same as the problem I'm having. But here goes:

I'm running a function in parallel using

cores <- detectCores() cl <- makeCluster(8L,outfile="output.txt") registerDoParallel(cl) x <- foreach(i = 1:length(y), .combine='list',.packages=c('httr','jsonlite'), .multicombine=TRUE,.verbose=F,.inorder=F) %dopar% {function(y[i])}

This often works fine, but is now throwing the error:

Error in serialize(data, node$con) : error writing to connection

Upon examination of the output.txt file I see:

starting worker pid=11112 on localhost:11828 at 12:38:32.867
starting worker pid=10468 on localhost:11828 at 12:38:33.389
starting worker pid=4996 on localhost:11828 at 12:38:33.912
starting worker pid=3300 on localhost:11828 at 12:38:34.422
starting worker pid=10808 on localhost:11828 at 12:38:34.937
starting worker pid=5840 on localhost:11828 at 12:38:35.435
starting worker pid=8764 on localhost:11828 at 12:38:35.940
starting worker pid=7384 on localhost:11828 at 12:38:36.448
Error in unserialize(node$con) : embedded nul in string: '\0\0\0\006SYMBOL\0\004\0\t\0\0\0\003')'\0\004\0\t\0\0\0\004expr\0\004\0\t\0\0\0\004expr\0\004\0\t\0\0\0\003','\0\004\0\t\0\0\0\024SYMBOL_FUN'
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -
unserialize
Execution halted

This error is intermittent. Memory is plentiful (32GB), and no other large R objects are in memory. The function in the parallel code retrieves a number of small json data objects from the cloud and puts them into an R object - so there are no large data files. I don't know why it occasionally sees an embedded nul and stops.

I have a similar problem with a function that pulls csv files from the cloud as well. Both functions worked fine under R 3.3.0 and R 3.4.0 until now.

I'm using R 3.4.1 and RStudio 1.0.143 on Windows.

Here's my sessionInfo

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RJSONIO_1.3-0     RcppBDT_0.2.3     zoo_1.8-0         data.table_1.10.4 
doParallel_1.0.10 iterators_1.0.8  
[7] RQuantLib_0.4.2   foreach_1.4.3     httr_1.2.1       

loaded via a namespace (and not attached):
[1] Rcpp_0.12.12     lattice_0.20-35  codetools_0.2-15 grid_3.4.1       
R6_2.2.2         jsonlite_1.5     tools_3.4.1     
[8] compiler_3.4.1

UPDATE

Now I get another similar error:

Error in unserialize(node$con) : ReadItem: unknown type 100, perhaps written by later version of R

The embedded nul error seems to have vanished. I've also tried deleting .Rhistory and .Rdata and also deleting my packages subfolder and reloading all pacakges. At least this new error seems consistent. I can't find what "unknown type 100" is.

JK_chitown
  • 179
  • 3
  • 9
  • Maybe you have large objects in your environment that are exported on the clusters? Try putting this foreach call in its own function. – F. Privé Jul 29 '17 at 06:29
  • That doesn't seem to be the problem - I actually deleted all extraneous objects in the environment. – JK_chitown Jul 31 '17 at 16:19
  • Can you reproduce the problem with a `function` that you can give us? – F. Privé Jul 31 '17 at 16:48
  • Wish I could, but `function` uses internal company information....I've been using this setup for months and now suddenly this error.... – JK_chitown Jul 31 '17 at 16:58

2 Answers2

7

I also noticed that multi-core sessions don't go away from the task manager.

Switching from using:stopCluster(cl) to stopImplicitCluster() Worked for me. From my reading, this is supposed to be used when using a "one line" registerDoParallel(cores=x) vs

cl<-makeCluster(x)
registerDoParallel(cl)

My "gut feeling" is that how Windows handles the clusters requires the stopImplicitCluster, but your experience may vary.

I would have commented but this is (cue band) MY FIRST STACKOVERFLOW POST!!!

6

I get a similar error... usually happens on a subsequent script run when one of my previous scripts errored out or I stopped it early. This could be the part where you mention: " I don't know why it occasionally sees an embedded nul and stops" which could be the error.

This has some good info, especially to make sure to leave 1 core for regular windows processes to run. Also mentions "If you get an error from either of those functions, it usually means that at least one of the workers has died" which could back up my theory about crashing after error.

doParallel error in R: Error in serialize(data, node$con) : error writing to connection

So far, my solution has been to re-initialize the parallel backend by running this again:

registerDoParallel(cl)

It usually works afterwards but I do notice that the previous multi-core sessions in my task manager do not go away, even with:

stopCluster(cl)

This is why I sometimes restart R.

BigTimeStats
  • 447
  • 3
  • 12
  • Yes, I've also had the problem of cores not halting even after `stopCluster(cl)`. I'll try to re-initialize the backend. I've tried to reduce the number of cores by 1 for regular windows processes, but this doesn't seem to help. – JK_chitown Jul 28 '17 at 19:32
  • I just tried the code again and it ran - but this is the first time in the last 15 attempts. The output file indicates nothing wrong this time. But hardly a confidence builder..... – JK_chitown Jul 28 '17 at 19:40
  • 1
    Three more attempts resulted in failure. Restarting R, reducing cores, and re-initializing the parallel backend do not help. – JK_chitown Jul 28 '17 at 20:15