6

I'm running a Bayesian MCMC probit model, and I'm trying to implement it in parallel. I'm getting confusing results about the performance of my machine when comparing parallel to serial. I don't have a lot of experience doing parallel processing, so it is possible I'm not doing it right.

I'm using MCMCprobit in the MCMCpack package for the probit model, and for parallel processing I'm using parLapply in the parallel package.

Here's my code for the serial run, and the results from system.time:

system.time(serial<-MCMCprobit(formula=econ_model,data=mydata,mcmc=10000,burnin=100))

   user  system elapsed 
 657.36   73.69  737.82

Here's my code for the parallel run:

#Setting up the functions for parLapply:
probit_modeling <- function(...) {
  args <- list(...)
  library(MCMCpack)
  MCMCprobit(formula=args$model, data=args$data, burnin=args$burnin, mcmc=args$mcmc, thin=1)
}

probit_Parallel <- function(mc, model, data,burnin,mcmc) {
  cl <- makeCluster(mc)
  ## To make this reproducible:
  clusterSetRNGStream(cl, 123)
  library(MCMCpack) # needed for c() method on master
  probit.res <- do.call(c, parLapply(cl, seq_len(mc), probit_modeling, model=model, data=data, 
                                        mcmc=mcmc,burnin=burnin))
  stopCluster(cl)
  return(probit.res)
}


system.time(test<-probit_Parallel(model=econ_model,data=mydata,mcmc=10000,burnin=100,mc=2))

And the results from system.time:

   user  system elapsed 
   0.26    0.53 1097.25 

Any ideas why user and system times would be so much shorter for the parallel process, but the elapsed time so much longer? I tried it at shorter MCMC runs (100 and 1000), and the story is the same. I'm assuming I'm making a mistake somewhere.

Here are my computer specifications:

  • R 3.1.3
  • 8 GB memory
  • Windows 7 64 bit
  • Intel Core i5 2520M CPU, dual core
kdopen
  • 8,032
  • 7
  • 44
  • 52

1 Answers1

2

It appears to me that both of the workers are doing as much work as is performed in the sequential version. The workers should only perform a fraction of the total work in order to execute faster than the sequential version of the code. That might be accomplished by dividing mcmc by the number of workers in this example, although that may not be what you real want to do.

I think that explains the long elapsed time reported by system.time. The "user" and "system" times are short because they are times for the master process which uses very little CPU time when executing parLapply: the real CPU time is used by the workers which isn't being reported by system.time.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • Thanks, I hadn't realized I had to do mcmc/cores. – robin.datadrivers Jun 08 '15 at 02:47
  • @robin.datadrivers I'm not sure if that will give you the results you want, but you need some way of splitting the problem into smaller pieces, otherwise there isn't any benefit in executing in parallel. – Steve Weston Jun 08 '15 at 12:55
  • I'm trying to do 10,000 number of runs, and want to make it faster. I think your solution of sending 5000 to each core is what I want, unless I'm interpreting it wrong or there is something else going on that I should be aware of. – robin.datadrivers Jun 08 '15 at 15:29