I'm setting up R on existing Hadoop cluster. I've so far installed R rpms and related library packages on one of the node (EDGE node) part of cluster and it works as expected. Do R rpms be installed on all servers part of cluster or just the library directory (in my case /usr/lib64/R/library) be synced up across all the servers ?
Asked
Active
Viewed 500 times
0
-
Goal is to execute hadoop MR jobs via R shell _(rmr, rhive, rhdfs should be accessible)_. Users will be logging only onto EDGE node. – Baskar Jayakumar Oct 20 '15 at 17:22
1 Answers
0
For rmr you need to install everywhere, for rhdfs you don't and for rhive i don't know. Install means R rpms or equivalent and necessary dependencies. As far as synching lib dirs, I've tried something similar to simplify the deployment of rmr2 but we (client and I, in agreement) pulled the plug because it seemed a very brittle strategy (depending on all the libraries to be perfectly identical). It worked in a very controlled environment, but we synched up the whole thing, not just the library.

piccolbo
- 1,305
- 7
- 17
-
Thanks Piccolbo. As per your statement "_but we synched up the whole thing, not just the library._" - you mean.. All the R core rpms were installed on all nodes and libraries were synced up across all nodes as well.. is that right ? – Baskar Jayakumar Oct 20 '15 at 22:38
-
Not sure what you mean by "core". We just shipped the whole subtree under R_HOME. You can still find that prototype in the 0-install branch of rmr2 (now outdated). – piccolbo Oct 21 '15 at 15:44
-
I just run into an article that explains how to create portable binaries: https://pmelsted.wordpress.com/2015/10/14/building-binaries-for-bioinformatics/ And you may also want to consider this http://fumodibit.blogspot.com/2013/04/modifying-r-to-obtain-relocatable.html to get R_HOME flexibility, which may be necessary to deploy on a cluster – piccolbo Oct 21 '15 at 18:40