2

Recently, I am trying to use reference class with parallel. I tried 4 different schemes, multicore, MPI, Socket and Forking. However, only multicore is able to produce correct result. and MPI, PSOCK and Forking all produce errors.

PS: I was running this script on a computer cluster with MPI support.

library(doParallel)

np = 4L
# cl <- makeCluster(np, type="MPI", outfile = "")
# cl <- makeCluster(np, type="PSOCK", outfile = "")
# cl <- makeCluster(np, type="FORK", outfile = "")
cl <- np # multicore

registerDoParallel(cl)
myClass = setRefClass("myClass",
    fields = c("a"),
    methods = list(
        hello = function(){cat("hello\n")},
        show = function(){cat("show\n")}
    )
)
objs = foreach(i = 1:4) %dopar% {
    obj = new("myClass")
    obj$a=i
    obj
}

It may be related to parallel computations on Reference Classes

Update:

More investigation reveals that reference class instances are cloned but not the reference class definition.

library(doParallel)

np = 4L
cl <- makeCluster(np, type="MPI", outfile = "")
# cl <- makeCluster(np, type="PSOCK", outfile = "")
# cl <- makeCluster(np, type="FORK", outfile = "")
# cl <- np # multicore
registerDoParallel(cl)

myClass = setRefClass("myClass",
    fields = c("a"),
    methods = list(
        hello = function(){cat("hello\n")},
        show = function(){cat("show\n")}
    )
)
obj = new("myClass")
obj$a = 0

results = foreach(i = 1:4) %dopar% {
    obj$a # no error
    newobj = new("myClass") # error
}
Community
  • 1
  • 1
Randy Lai
  • 3,084
  • 2
  • 22
  • 23

1 Answers1

2

The problem is that cluster workers need to be initialized. For a case like this, I would use either clusterEvalQ or clusterCall:

clusterEvalQ(cl, {
    myClass <- setRefClass("myClass",
                           fields = c("a"),
                           methods = list(
                               hello = function(){cat("hello\n")},
                               show = function(){cat("show\n")}
                           ))
    NULL
})

Note that I included the "NULL" in the R expression to avoid serializing and returning the generator function from the cluster workers.

This initialization isn't necessary when using the "multicore" version, since mclapply is used, so the workers are initialized by virtue of being forked by the master process that has performed the initialization.

Interestingly, you don't have to do this initialization when using a "FORK" cluster, but you have to call setRefClass before creating the "FORK" cluster.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • Thanks, and interestingly, I don't have problems using a "FORK" cluster on my own computer and it has problems on my school cluster (same as MPI and PSOCK). But anyway, I won't use "FORK" on my school cluster. – Randy Lai Mar 09 '14 at 21:37
  • And I note that reference class "instances", created by `new`, are cloned to the workers, on the other hand, `Reference Class` definition is not "cloned" to workers. – Randy Lai Mar 09 '14 at 21:53