1

I have several i7 desktops which I would like to use to speed up the computation time of function genoud in package rgenoud. In genoud, you can assign a cluster that was generated in parallel (and I assume snow also).

What kind of clustering software would you recommend for this? I have tried Beowulf clusters, but the documentation from this is mostly outdated, so I am looking for a guide which shows me how to do it which is still up to date.

This cluster is going to be over LAN, and I have assigned IPs to the nodes.

Thanks

user3666197
  • 1
  • 6
  • 50
  • 92
Keshav M
  • 1,309
  • 1
  • 13
  • 24
  • What OS are these i7 running? It's pretty easy with Linux: set up ssh authentication and MPI and you're off to the races. If that is too much, but a Redis db somewhere and use doRedis. And on and on ... – Dirk Eddelbuettel Dec 19 '17 at 18:00
  • Ubuntu. Is there a guide you would recommend for MPI? All of them I could find were extremely outdated and were giving me errors as a result. – Keshav M Dec 19 '17 at 18:33
  • I use Ubuntu too (at work) and have it (had it) set up for slurm. There are / were a few related questions here eg [this one](https://stackoverflow.com/questions/47002755/emulating-slurm-on-ubuntu-16-04/) and others by [@landau](https://stackoverflow.com/users/3704549/landau). You could come to the r-sig-hpc too. One thing to keep in mind is OpenMPI 1.* (older packages) vs 2.* vs 3.* (upstream). – Dirk Eddelbuettel Dec 19 '17 at 18:38
  • When using Rmpi, I keep getting the error "MPI_Comm_spawn" is not supported. I read that it had to do with using MPICH, so I installed OpenMPI, but I am still getting the error. Can you help me understand what is wrong? Sorry I am not very knowledgeable about this. – Keshav M Dec 19 '17 at 19:30
  • Rmpi from `r-cran-rmpi` or directly installed? When do you get the error? Do you know that MPI requires `mpirun` / `orterun` around scripts and all that? – Dirk Eddelbuettel Dec 19 '17 at 19:32
  • When I try to initiate the cluster `makeMPIcluster`, it gives me this error. I thought mpirun/orterun is only when running the MPI from terminal, not R, but I am probably wrong about this. Could you explain it? Thanks! – Keshav M Dec 19 '17 at 19:34
  • Not here, sorry. – Dirk Eddelbuettel Dec 19 '17 at 19:34
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/161553/discussion-between-keshav-m-and-dirk-eddelbuettel). – Keshav M Dec 19 '17 at 19:36
  • No, sorry, I do not the bandwidth for one on one tutorials. – Dirk Eddelbuettel Dec 19 '17 at 19:38
  • If of any help, I'm on an up-to-date Ubuntu 16.04 and I never managed to get core Rmpi to work there. See https://stat.ethz.ch/pipermail/r-sig-hpc/2017-October/002069.html for my call for help on this. – HenrikB Dec 19 '17 at 23:03
  • To your original question: Using a scheduler (e.g. [Slurm](https://slurm.schedmd.com/), [SGE](https://arc.liv.ac.uk/trac/SGE)) is useful and probably a long-term requirement if multiple users use the cluster concurrently. But you can still get going quickly using an ad-hoc cluster created by `parallel::makeCluster(c("node1", "node2", "node2", "node3"))` as long as R is installed one each node. That does not require MPI. Have you tried that? – HenrikB Dec 19 '17 at 23:10
  • I have tried this, but it just hangs at makeCluster each time. I left it overnight once and still nothing the next morning. Any idea what could be causing that? – Keshav M Dec 20 '17 at 02:17
  • Could be firewall issues (need port fwd:ing back to your machine). Try with `cl <- future::makeClusterPSOCK(c("node1", "node2", "node2", "node3"), verbose = TRUE)` instead. It avoids the need for port fwd:ing - and the verbose will give you a little bit more info too. Also, try with only one machine and focus on that. – HenrikB Dec 21 '17 at 01:24
  • I finally got makeCluster to semi-work, but after each process reaches 182.6 MB ram on node01, CPU usage drops to 0. Any idea why RAM usage is being limited on a 64 gb RAM machine? – Keshav M Dec 21 '17 at 03:57
  • future::makeClusterPSOCK also does this same behavior. I am really confused why that would be happening. – Keshav M Dec 21 '17 at 18:08

0 Answers0