2

I'm trying to install mxnet with gpu on colab.

I guess current colab has cuda 11.1 installed by default as

!nvcc --version

gives

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

I've tried 3 different approaches to achieve the goal but none of them worked.

First try - cuda 11.2, local deb

Firstly, I tried this set of commands from the nvidia docs:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda-repo-ubuntu1804-11-2-local_11.2.0-460.27.04-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-2-local_11.2.0-460.27.04-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-2-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

The installation process went well though, I got the latest version of cuda, that is, 11.4.

Second try - cuda 11.2, runfile

Secondly, I tried the runfile

!wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
!sh ./cuda_11.2.2_460.32.03_linux.run --toolkit --silent --override

The installation process went well and I guess I've managed to install cuda 11.2 as this command

!nvcc --version

gives

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

and then I ran this command

!pip install mxnet-cu112

and got

Collecting mxnet-cu112
  Downloading mxnet_cu112-1.8.0.post0-py2.py3-none-manylinux2014_x86_64.whl (495.7 MB)
     |████████████████████████████████| 495.7 MB 15 kB/s 
Collecting graphviz<0.9.0,>=0.8.1
  Downloading graphviz-0.8.4-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: numpy<2.0.0,>1.16.0 in /usr/local/lib/python3.7/dist-packages (from mxnet-cu112) (1.19.5)
Requirement already satisfied: requests<3,>=2.20.0 in /usr/local/lib/python3.7/dist-packages (from mxnet-cu112) (2.23.0)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.20.0->mxnet-cu112) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.20.0->mxnet-cu112) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.20.0->mxnet-cu112) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.20.0->mxnet-cu112) (2021.5.30)
Installing collected packages: graphviz, mxnet-cu112
  Attempting uninstall: graphviz
    Found existing installation: graphviz 0.10.1
    Uninstalling graphviz-0.10.1:
      Successfully uninstalled graphviz-0.10.1
Successfully installed graphviz-0.8.4 mxnet-cu112-1.8.0.post0

Finally, I tested the installation with this command

import mxnet as mx

and I got the libnvrtc error

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-7-265f02e9c062> in <module>()
----> 1 import mxnet as mx

4 frames
/usr/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    362 
    363         if handle is None:
--> 364             self._handle = _dlopen(self._name, mode)
    365         else:
    366             self._handle = handle

OSError: libnvrtc.so.11.2: cannot open shared object file: No such file or directory

So, I tried to check the existence of the library

!find /usr/ -name "libnvrtc*"

and I got

/usr/local/lib/python3.7/dist-packages/torch/lib/libnvrtc-08c4863f.so.10.2
/usr/local/lib/python3.7/dist-packages/torch/lib/libnvrtc-builtins.so
/usr/local/lib/python2.7/dist-packages/torch/lib/libnvrtc-5e8a26c9.so.10.1
/usr/local/lib/python2.7/dist-packages/torch/lib/libnvrtc-builtins.so
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libnvrtc.so.11.2
/usr/local/cuda-11.2/targets/x86_64-linux/lib/nvrtc-prev/libnvrtc.so.11.2
/usr/local/cuda-11.2/targets/x86_64-linux/lib/nvrtc-prev/libnvrtc.so
/usr/local/cuda-11.2/targets/x86_64-linux/lib/nvrtc-prev/libnvrtc-builtins.so.11.2.152
/usr/local/cuda-11.2/targets/x86_64-linux/lib/nvrtc-prev/libnvrtc.so.11.2.152
/usr/local/cuda-11.2/targets/x86_64-linux/lib/nvrtc-prev/libnvrtc-builtins.so.11.2
/usr/local/cuda-11.2/targets/x86_64-linux/lib/nvrtc-prev/libnvrtc-builtins.so
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libnvrtc.so
/usr/local/cuda-11.2/targets/x86_64-linux/lib/stubs/libnvrtc.so
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.2.152
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libnvrtc.so.11.2.152
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.2
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libnvrtc-builtins.so
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.0
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc.so.11.0.221
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc.so
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.0.221
/usr/local/cuda-11.0/targets/x86_64-linux/lib/stubs/libnvrtc.so
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc.so.11.0
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc-builtins.so
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libnvrtc.so
/usr/local/cuda-11.1/targets/x86_64-linux/lib/stubs/libnvrtc.so
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libnvrtc.so.11.1
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.1.105
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.1
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libnvrtc.so.11.1.105
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libnvrtc-builtins.so
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.0.130
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libnvrtc.so
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libnvrtc.so.10.0.130
/usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs/libnvrtc.so
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.0
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libnvrtc.so.10.0
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libnvrtc-builtins.so
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libnvrtc.so
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libnvrtc.so.10.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/stubs/libnvrtc.so
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libnvrtc.so.10.1.243
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.1.243
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libnvrtc-builtins.so

and another command

%ll /usr/local/cuda/lib64/libnvrtc*

gives

lrwxrwxrwx 1 root       25 Sep 22 00:58 /usr/local/cuda/lib64/libnvrtc-builtins.so -> libnvrtc-builtins.so.11.2*
lrwxrwxrwx 1 root       29 Sep 22 00:57 /usr/local/cuda/lib64/libnvrtc-builtins.so.11.2 -> libnvrtc-builtins.so.11.2.152*
-rwxr-xr-x 1 root  6122648 Sep 22 00:57 /usr/local/cuda/lib64/libnvrtc-builtins.so.11.2.152*
lrwxrwxrwx 1 root       16 Sep 22 00:58 /usr/local/cuda/lib64/libnvrtc.so -> libnvrtc.so.11.2*
lrwxrwxrwx 1 root       20 Sep 22 00:57 /usr/local/cuda/lib64/libnvrtc.so.11.2 -> libnvrtc.so.11.2.152*
-rwxr-xr-x 1 root 43954832 Sep 22 00:57 /usr/local/cuda/lib64/libnvrtc.so.11.2.152*

Does it mean I've already had the library that mxnet-cu112 needs?

I tried to specify the directory for mxent as that's where "libnvrtc.so.11.2" is located,

%env LD_LIBRARY_PATH=/usr/local/cuda/lib64/

but it didn't work either.

I also tried this

!apt-get install -y libnvrtc=11.2

and I got this

Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package libnvrtc

How do I fix the "libnvrtc" error?

Third try - cuda 10.2

I factory-reset the runtime and tried these commands:

!wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
!sh ./cuda_10.2.89_440.33.01_linux.run --toolkit --silent --override
!pip install mxnet-cu102

Everything went well until this command

import mxnet as mx

gives

OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory

and this command

%ll /usr/local/cuda/lib64/libcudart*

gives this

lrwxrwxrwx 1 root     17 Sep 22 01:36 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.10.2*
lrwxrwxrwx 1 root     20 Sep 22 01:35 /usr/local/cuda/lib64/libcudart.so.10.2 -> libcudart.so.10.2.89*
-rwxr-xr-x 1 root 509248 Sep 22 01:35 /usr/local/cuda/lib64/libcudart.so.10.2.89*
-rw-r--r-- 1 root 902366 Sep 22 01:36 /usr/local/cuda/lib64/libcudart_static.a

I also tried this thread but none worked for me.

How do I fix the error?

Another possible solution might be to install another version of mxnet though it seems there is no mxnet Binary for CUDA 11.1

talonmies
  • 70,661
  • 34
  • 192
  • 269
JJJohn
  • 915
  • 8
  • 26
  • 2
    The mxnet version you have installed was clearly built against and requires CUDA 11.2. It even has it in the name of the package. You don't have CUDA 11.2 installed. You have CUDA 11.1 installed. – talonmies Sep 19 '21 at 08:17

1 Answers1

2

The following approach works for cuda-10.0 and cuda-11.0:

!sudo ln -sfT /usr/local/cuda/cuda-10.0/ /usr/local/cuda
!pip install mxnet-cu100mkl

import mxnet
mxnet.__version__

For cuda-11.0, just replace the first two lines with:

!sudo ln -sfT /usr/local/cuda/cuda-11.0/ /usr/local/cuda
!pip install mxnet-cu110
user1635327
  • 1,469
  • 3
  • 11