9

I remotely access High-performance computing nodes. I am not sure about NVIDIA Collective Communications Library (NCCL) is installed in my directory or not. Is there any way to check whether the NCCL is installed or not?

Ahmad
  • 645
  • 2
  • 6
  • 21

2 Answers2

17

You can try

locate nccl| grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'

or if you use PyTorch:

python -c "import torch;print(torch.cuda.nccl.version())"

Check it this link Command Cheatsheet: Checking Versions of Installed Software / Libraries / Tools for Deep Learning on Ubuntu

For containers, where no locate is available sometimes, one might replace it with ldconfig -v:

ldconfig -v | grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'
Sadra
  • 2,480
  • 2
  • 20
  • 32
  • 2
    When I enter `locate nccl| grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'1, it show nothing. – Ahmad Apr 07 '21 at 11:45
-4

You can usually do this in the command line:

nvcc --version

you might have to run:

sudo apt install nvidia-cuda-toolkit

too.


As the other answerer mentioned, you can do:

torch.cuda.nccl.version()

in pytorch. Copy paste this into your terminal:

python -c "import torch;print(torch.cuda.nccl.version())"

I am sure there is something like that in tensorflow.

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
  • NVCC is a general CUDA C++ compiler. It doesn't report NCCL (communications library) version. The first part of the answer is wrong. – Dima Mironov Sep 07 '21 at 08:33