1

So I tried using the snowfall package for parallel execution in R, using all my cpu cores. This is the code I used for testing:

library(snow)
library(snowfall)
sfInit(parallel = TRUE, cpus = 16, type = "SOCK")
data <- array(1:1000000, dim=c(1000000,1))
system.time(x <- sfLapply(data, fun=function(x){return (x*x) }))

Which effectively runs 16 times faster as it uses all CPU cores available. But when I try this:

system.time(m2 <- J48(CHURNED_F~., data = data[, -c(1)]))

It takes about 50 seconds, as a test (with only about 1% of the whole data set) The following runs correctly but takes the same time and only uses one CPU:

library(snow)
library(snowfall)
sfInit(parallel = TRUE, cpus = 16, type = "SOCK")
system.time(m2 <- sfLapply("CHURNED_F~.", J48, data[, -c(1)]))

Am I just using the wrong syntax? How can I make this run in parallel?

Fermin
  • 473
  • 1
  • 7
  • 19
  • If I'm not entirely mistaken, rWeka is just using rJava. Hence, you are starting (multi-threaded) a supposedly single-threaded java process. – CAFEBABE Jan 22 '16 at 17:05
  • Yes, it uses rJava. Then how do I start a multi threaded java process in this case? – Fermin Jan 22 '16 at 20:32
  • This is not really possible. Java is a separate process. Hence you would need to change the java/weka side. More promising is to parallelise on a higher level, e.g., parallelise cross validation, which you are hopefully doing or to try several parameters for the training. – CAFEBABE Jan 22 '16 at 22:12

0 Answers0