1

I've been looking into running R on EC2, but I'm wondering what the deal is with parallel/cluster computing is with this setup. I've had a look around but I haven't been able to find a tutorial for this.

Basically what I'm looking to do is have R (Rstudio) running on my laptop, and do most of the work on that, but then when I have a big operation to run, explicitly pass it to an AWS slave instance to do all the heavy lifting.

As far as I can see, snow/snowfall packages seem to be the answer... but I'm not really sure how.

I'm using the tutorial on http://bioconductor.org/help/bioconductor-cloud-ami/ (the ssh one) to have R running. This tutorial does mention paralell/cluster, but it seems to be between different AWS instances.

Any help would be great. Cheers.

Ger
  • 754
  • 1
  • 9
  • 33

1 Answers1

0

If you need only one slave instance I've found it's easiest to just run it in parallel on the instance rather than using your PC as a master.

You can write the script on your PC and push it up to a multicore server with R running on it and then run it on there using all cores in parallel.

For example upload this to a 4 core AWS instance:

library(snowfall)
sfInit(parallel=TRUE,cpus=4,slaveOutfile="log.txt")

vars = c(1:100)

#send variables to all processors
sfExportAll()

#Run this in parallel
results = sfLapply(vars, exp)

#Stop parallel processing
sfStop()

#save results
save(results, file = "results.RData")
Dirk Calloway
  • 2,569
  • 4
  • 23
  • 34