1

I'm attempting to setup an Amazon Linux EC2 instance with MXNet and R (and the MXNet r package available as well). Unfortunately this has been a lot harder than I expected.

I've attempted to follow the instructions from MXNet using Amazon's deep learning AMI with CUDA 8.0 on a p2.xlarge (https://mxnet.incubator.apache.org/get_started/install.html)

However I get the same error when attempting to compile the mxnet r package from this SO post:

Issues installing mxnet GPU R package for Amazon deep learning AMI

The solution discussed in that post are somewhat beyond my abilities to fully test/debug. i.e. I'm not particularly familiar with linux environment variables and such to modify. I've also reviewed some issues raised on the apache-incubator github for MXnet and those were pretty unhelpful as well.

So my questions are,

  1. Is anyone aware of any available AMI's which come pre-packaged with R and MXNet? The ones I see seem to only include python.
  2. Have a working set of instructions (or a script) to run on an Amazon Linux EC2 instance to install the required dependencies (assuming Im using some type of deep learning AMI that comes with CUDA 8.0 at least) to install the MXnet R package?
Taran
  • 265
  • 1
  • 11

1 Answers1

1

Right so I was the guy on the other post and I DID eventually get it working. Took 50+ hours and I'm not 100% sure where the issue was because...linux.

sudo yum install R
sudo yum install libxml2-devel   
sudo yum install cairo-devel
sudo yum install giflib-devel
sudo yum install libXt-devel
sudo R
install.packages("devtools")
library(devtools)
install_github("igraph/rigraph")
install.packages(c(“DiagrammeR”, “roxygen2”, “rgexf”, “influenceR”,  “Cairo”, “imager”))
cd
cd /src/mxnet
cp make/config.mk .
echo "USE_BLAS=openblas" >>config.mk
echo "ADD_CFLAGS += -I/usr/include/openblas" >>config.mk
echo "ADD_LDFLAGS += /usr/local/lib" >>config.mk
echo "USE_CUDA=1" >>config.mk
echo "USE_CUDA_PATH=/usr/local/cuda-9.0/lib64" >>config.mk
echo "USE_CUDNN=1" >>config.mk
*add another LD flag for /usr/local/lib

cd /etc/ld.so.conf.d/
sudo nano  cuda.conf
    Insert     /usr/local/cuda-9.0/lib64
cd
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
sudo ldconfig

cd R-package
Rscript -e "install.packages('devtools', repo = 'https://cran.rstudio.com')"
Rscript -e "library(devtools); library(methods);options(repos=c(CRAN='https://cran.rstudio.com'));install_deps(dependencies = TRUE)"
cd ..

sudo make rpkg

THEN you gotta make sure R/Rstudio can actually find those libraries:

cd /etc/rstudio
sudo nano rserver.conf

You can add elements to the default LD_LIBRARY_PATH for R sessions (as determined by the R ldpaths script) by adding an rsession-ld-library-path entry to the server config file. This might be useful for ensuring that packages can locate external library dependencies that aren't installed in the system standard library paths. For example:

rsession-ld-library-path=/opt/local/lib:/usr/local/cuda/lib64
Garglesoap
  • 565
  • 6
  • 18
  • Thanks for your reply @Garglesoap. As a starting point was this on a clean linux install or are you working with an AMI (assuming this was done on AWS)? If so can you tell me what was pre-installed before you got started? – Taran Nov 21 '17 at 15:07
  • Hey @Garglesoap. I made a couple of modifications to the instructions above to get them to work. However, steps you describe above lead to the same error when I run the sudo make rpkg command. i.e. `code` Error: package or namespace load failed for ‘mxnet’: .onLoad failed in loadNamespace() for 'mxnet', details: call: dyn.load(file, DLLpath = DLLpath, ...) error: unable to load shared object '/usr/lib64/R/library/mxnet/libs/libmxnet.so': libmklml_intel.so: cannot open shared object file: No such file or directory – Taran Nov 21 '17 at 21:25
  • Specifically I updated the cd /src/mxnet command to cd src/mxnet since I dont have a /src directory. Also I'm not sure what you meant by "*add another LD flag for /usr/local/lib" and since you had added it to the config.mk already I did nothing extra. Also when I created the cuda.conf all it has is /usr/local/cuda-9.0/lib64. Finally I had to use sudo Rscript since Rscript didnt work ('lib = "/usr/lib64/R/library"' is not writable error when I dont sudo) – Taran Nov 21 '17 at 21:29
  • This was on the AWS Deep Learning AMI. It has cuda, cudnn, mxnet pre-installed, but not the R, Rstudio or the mxnet R package. – Garglesoap Nov 22 '17 at 16:33
  • At least you're getting the same error as me, I had to add the LD library path in all 3 locations since none seemed to work on their own (config.mk, cuda.conf and using export commands. The *add another LD flag comment I think was a note to myself. You can also try adding the library path to your make command: sudo LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 make rpkg – Garglesoap Nov 22 '17 at 17:00
  • No luck running it with the LD_LIBRARY_PATH in the make command. I get the same error – Taran Nov 27 '17 at 20:14
  • Really? Adding the cuda path to the make command slightly changed the error for me, as it found the cuda folder but couldn't find the local/lib. You tried it in conjunction with the other 3 lib path additions? – Garglesoap Nov 27 '17 at 20:26
  • Yes. The error appears to be more fundamental. There are a few places in your steps that I had to deviate and is likely why Im not able to reproduce the same error. For instance my instance (which also uses the deep learning AMI) doesnt have /src/mxnet. The scr/mxnet is located in the ec2-user home directory. On a different note when I examine the location that the error points to, the /usr/lib64/R/library/mxnet/ location doesnt exist at all – Taran Nov 27 '17 at 20:33
  • Well if we're both using the AWS cuda9 AMI it should be the same. When you SSH in, you login as ec2-user, correct? When I SSH in as ec2-user, then: ls shows AmazonLinuxCuda9.README.md Nvidia_Cloud_EULA.pdf and src – Garglesoap Nov 27 '17 at 20:37
  • Yep. When I ssh into the instance an ls shows the src dir which contains mxnet, theano etc. Meaning they are all located in /home/ec2-user/src/ and not in /src – Taran Nov 27 '17 at 20:42
  • Going back to your instructions what exactly is the cuda.conf supposed to contain. Mine says `/usr/local/cuda-9.0/lib64`. Is it supposed to state something like `export /usr/local/cuda-9.0/lib64`? – Taran Nov 27 '17 at 20:44
  • Also when after I cd into R-package, I can only run the following commands as sudo `Rscript -e "install.packages('devtools', repo = 'https://cran.rstudio.com')" Rscript -e "library(devtools); library(methods);options(repos=c(CRAN='https://cran.rstudio.com'));install_deps(dependencies = TRUE)"` – Taran Nov 27 '17 at 20:45
  • ok the /home/ec2-user/src/ is correct, cuda.conf is correct with /usr/local/cuda-9.0/lib64, and the R-package commands also look right, although maybe you need sudo in front. Can you post your the bottom of your config.mk file and also try rerunning the export LD lib commands and sudo ldconfig commands – Garglesoap Nov 27 '17 at 20:48
  • `# whether to use sframe integration. This requires build sframe # git@github.com:dato-code/SFrame.git # SFRAME_PATH = $(HOME)/SFrame # MXNET_PLUGINS += plugin/sframe/plugin.mk USE_BLAS=openblas ADD_CFLAGS += -I/usr/include/openblas ADD_LDFLAGS += /usr/local/lib USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda-9.0/lib64 USE_CUDNN=1` – Taran Nov 27 '17 at 21:27
  • Ok this is a bit different, I have # MXNET_PLUGINS += plugin/sframe/plugin.mk USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda-9.0/lib64 USE_CUDNN=1 USE_DIST_KVSTORE=1 USE_MKL2017=1 USE_BLAS=openblas USE_S3=1 CUDA_ARCH := -gencode arch=compute_35,code=sm_35 -gencode arch=compute_52,code=sm_52 -gencod$ – Garglesoap Nov 27 '17 at 21:31
  • Also above that, I changed/relevant ones left unchanged: ADD_LDFLAGS = /usr/local/lib, USE_CUDA=1, USE_CUDA_PATH = /usr/local/cuda-9.0/lib64, USE_CUDNN=1, USE_OPENCV=1, USE_OPENMP=1, MKLML_ROOT=/usr/local – Garglesoap Nov 27 '17 at 21:34
  • I modified all the differences between mine and yours except for CUDA_ARCH which I dont see in the config.mk and still getting an error. Seems different though `Error: package or namespace load failed for ‘mxnet’: .onLoad failed in loadNamespace() for 'mxnet', details: call: dyn.load(file, DLLpath = DLLpath, ...) error: unable to load shared object '/usr/lib64/R/library/mxnet/libs/libmxnet.so': libmklml_intel.so: cannot open shared object file: No such file or directory` – Taran Nov 27 '17 at 21:44
  • Progress! This error means the make cannot find the usr/local/lib dir, but atleast is finding your cuda directory. If you used the 'make LD path rpkg' command then both environment variables are still not set correctly. If you did 'make rpkg' then for some reason it's finding cuda and not usr/local/lib. I still think your issue is at the "export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH sudo ldconfig" part. Can you confirm that after the two export commands the sudo ldconfig actually does something? – Garglesoap Nov 27 '17 at 22:35
  • thanks for your help but I'm holding off putting any more hours into this issue. I was able to use kerasR for what I needed. Perhaps the good people Apache will resolve these issues before I need to re-visit this. – Taran Dec 04 '17 at 15:40
  • For what it's worth, tried the install again using UBUNTU aws ami, worked like a charm. Much easier install. Mxnet running no problems on Rstudio server. Lemme know if you're interested. – Garglesoap Jan 09 '18 at 00:17