12

I've been able to successfully set up an Ubuntu 18.04 server with nvidia-smi 418.39, Driver version 418.39, and CUDA 10.1

I now have a user who wants to run TensorFlow but insists that it is not compatible with CUDA 10.1, only CUDA 10. There is no statement confirming this online anywhere that I can find, nor is it in any release patch notes from TF. Because setting this system up was kind of a pain to do, I'm a little hesitant to try downgrading just one version.

Does anyone have verification whether TensorFlow 1.12 does or does not work with CUDA 10.1?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Eric Berry
  • 121
  • 1
  • 1
  • 3
  • Surely what you need is here: https://www.tensorflow.org/install/gpu This is also basically a duplicate: https://stackoverflow.com/q/53591511/1531971 –  Feb 28 '19 at 19:22
  • Being with CUDA 10.1 is not as easy as building with CUDA 10.0, as parts of CUDA have been renamed or moved around in 10.1. See this issue in github: https://github.com/tensorflow/tensorflow/issues/26150 – William D. Irons Feb 28 '19 at 19:30
  • Above I meant to say "Building with CUDA 10.1..." Anyways, the short answer is it would be much easier to downgrade to CUDA 10.0 and use TensorFlow 1.13.1 with CUDA 10 support then to try to compile TensorFlow from source with CUDA 10.1 – William D. Irons Feb 28 '19 at 19:42
  • I understand- that's enough info for me (seeing SO and GH veterans having trouble is not something I myself want to sift through) - will be reverting to CUDA 10.0 and proceeding with TF 1.13. Thanks for the quick replies! – Eric Berry Feb 28 '19 at 19:49
  • Well it seem's like that the TF2.0 also haven't supported the CUDA10.1 yet. – zezhong ren Jul 14 '19 at 03:56
  • I'm voting to close this question as off-topic because we are not support for your favorite company/project. – Luuklag Jul 24 '19 at 19:31

3 Answers3

6

I can confirm that even tf 1.13.1 only works with CUDA 10.0 for me, not 10.1. Don't know if symlink will work through. If you try to run tf 1.13.1 on CUDA 10.1, it will give you "ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory"

4

TensorFlow 1.12 (and even later versions 1.13.1 and 2.0.0-alpha0) could not be built against CUDA 10.1, thus can be considered incompatible.

I have tried building TensorFlow from source with GPU support. The TensorFlow versions I considered were 1.13.1 and 2.0.0-alpha0. The machine I used runs CentOS 7.6 with GCC 4.8.5. I have the NVIDIA Driver version 418.67 installed (which has the release date 2019.5.7 and supports CUDA Toolkit 10.1).

I succeeded in building both TensorFlow versions with CUDA 10.0 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.0). Note that you don't need to have the GPU attached to the machine (especially if you're using a VM in the cloud) while you're building TensorFlow with GPU support.

However, when I switched to CUDA 10.1 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.1), none of these TensorFlow versions could be built. Besides the changes in location of libcublas, another source of the error is no libcudart.so* are found in cuda-10.1/lib64/ (while they do exist in cuda-10.0/lib64/).

TDT
  • 51
  • 5
2

I can also confirm that tf 1.13.1 does not work with CUDA 10.1. While importing tensorflow you will get the following error

ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

running ldconfig -v shows the difference libcublas.so.10.0 vs libcublas.so.10.1.0.105

fisakhan
  • 704
  • 1
  • 9
  • 27