5

I used fread with foreach and doParallel package in R 3.2.0 in ubuntu 14.04. The following code works just fine, even though I didn't use registerDoParallel.

library(foreach)
library(doParallel)
library(data.table)

write.csv(iris,'test.csv',row.names=F)

cl<-makeCluster(4)

tmp<-foreach(i=1:10) %dopar% { t <- fread('test.csv') }

tmp<-rbindlist(tmp)

stopCluster(cl)

However, when switching to Windows 7 it no longer works, with or without 'registerDoParallel'.

library(foreach)
library(doParallel)
#library(doSNOW)
library(data.table)

write.csv(iris,'test.csv',row.names=F)

cl<-makeCluster(4) 
registerDoParallel(cl)
#registerDoSNOW(cl)

tmp<-foreach(i=1:10) %dopar% { t <- fread('test.csv') }

tmp<-rbindlist(tmp)

stopCluster(cl)

'doSNOW' package doesn't work either. Below is the error message.

Error in { : task 1 failed - "could not find function "fread""

Does anyone have any similar experience?


A follow up question is regarding nested foreach. It seems the following won't work.

cl<-makeCluster(4)
registerDoParallel(cl)
clusterEvalQ(cl , library(data.table))

tmp<-foreach(j=1:10) %dopar% {

            tmp1<-foreach(i=1:10) %dopar% {
                          t<-fread('test.csv',data.table=T)
                   }  
            rbindlist(tmp1)
      }
stopCluster(cl)

   

svick
  • 236,525
  • 50
  • 385
  • 514
Lamothy
  • 337
  • 4
  • 17
  • note without the `registerDoParallel` it will not run in parallel (you only get a warning the first time). To get `fread` to work you may need to pass the functions to each cluster via, something like `clusterEvalQ(cl, library(data.table))` (untested) – user20650 Jun 16 '15 at 01:31
  • 1
    might be of interest http://stackoverflow.com/questions/17345271/r-how-does-a-foreach-loop-find-a-function-that-should-be-invoked and http://stackoverflow.com/questions/27341210/foreach-works-even-without-exporting-variable-and-specifying-package-dependency – user20650 Jun 16 '15 at 01:34
  • Yes, you are right. Thanks for pointing it out. I also get this warning only for the first time. Looks like I have the same problem for both ubuntu and windows. – Lamothy Jun 16 '15 at 01:35
  • With `registerDoParallel(cl)` in ubuntu I got the same error message `Error in { : task 1 failed - "could not find function "fread""`. – Lamothy Jun 16 '15 at 01:42
  • @user20650, thanks for the tips. `foreach(i=1:10,.export='fread')` can solve the problem. – Lamothy Jun 16 '15 at 01:50
  • This also seems to work on ubuntu `clusterEvalQ(cl , library(data.table))` after registering the backend – user20650 Jun 16 '15 at 01:53
  • Cool. If I have more than one functions from different libraries, does it work if I use `clusterEvalQ` to add all the functions to the cluster? – Lamothy Jun 16 '15 at 02:00
  • `clusterEvalQ(cl , c(library(data.table), library(foreach)))` but i wouldnt imagine this is the way to go – user20650 Jun 16 '15 at 02:31
  • You are right again, @user20650. It works! – Lamothy Jun 16 '15 at 02:40

1 Answers1

3

Thanks to user20650 for the reference in here. Basically it can be solved by setting .export='fread' in the foreach function.

More precisely, the following will fix the problem.

 tmp<-foreach(i=1:10,.export = 'fread') %dopar% { 
              t <- fread('test.csv',data.table=T) 
      }

To my follow up question regarding nested foreach, user20650 answered it in his comments. Namely,adding clusterEvalQ(cl , c(library(data.table),library(foreach))). The following code seems to work both in ubuntu and windows.

cl<-makeCluster(4)
registerDoParallel(cl)
clusterEvalQ(cl , c(library(data.table),library(foreach)))

tmp<-foreach(j=1:10) %dopar% {

     tmp1<-foreach(i=1:10) %dopar% { t <- fread('test.csv',data.table=T) }
     rbindlist(tmp1)
     }
Community
  • 1
  • 1
Lamothy
  • 337
  • 4
  • 17
  • see http://stackoverflow.com/questions/30927693/how-can-i-parallelize-a-double-for-loop-in-r?answertab=votes#tab-top about the double for loop – user20650 Jun 19 '15 at 00:15