1

Can anyone please help me to understand about NVIDIA devices serias 30 with Ampere architecture and compatible CUDA versions?

From here and from all over the net I understand that in CUDA toolkit v11 support for Ampere was added :

https://forums.developer.nvidia.com/t/can-rtx-3080-support-cuda-10-1/155849

What I don't understand is how it make sense with this :

https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html

Section

"1.3.1. Applications Built Using CUDA Toolkit 10.2 or Earlier"

So ‍♂️ is it posible or not with CUDA 10.1 ?

Thanks you very very much

Stav Bodik
  • 2,018
  • 3
  • 18
  • 25

1 Answers1

3

Note the sentence

CUDA applications built using CUDA Toolkit versions 2.1 through 10.2 are compatible with NVIDIA Ampere architecture based GPUs as long as they are built to include PTX versions

(emphasis mine)

Plus the explanation in the section above.

When a CUDA application launches a kernel on a GPU, the CUDA Runtime determines the compute capability of the GPU in the system and uses this information to find the best matching cubin or PTX version of the kernel. If a cubin compatible with that GPU is present in the binary, the cubin is used as-is for execution. Otherwise, the CUDA Runtime first generates compatible cubin by JIT-compiling 1 the PTX and then the cubin is used for the execution. If neither compatible cubin nor PTX is available, kernel launch results in a failure.

In effect: The CUDA toolkit remains ABI-compatible between 2.1 and 11. Therefore an application built for an old version will continue to load at runtime. The CUDA runtime will then detect that your kernels are built for a version that is not compatible with Ampere. So it will take the PTX and compile a new version at runtime.

As note in comments, only a current driver is required on the production system for this to work.

Homer512
  • 9,144
  • 2
  • 8
  • 25
  • Thanks for this, I have a C++ project Using ToolKit version 10.1 with .cu files that has kernal functions and it is not building .ptx files : NVCC Compilation type "Generate hybrid object file (--compile)" . But I still see that it is running and using GPU, how do you explain that ? – Stav Bodik Dec 01 '22 at 09:44
  • Is it possible maybe due to newest drivers installed ? "your system needs toolkit version 11 to be installed but you don't need version 11 on the system that builds your application." On my production system there is no toolkit installed but only NVIDIA latest drivers and CUDA 10.1 dlls. – Stav Bodik Dec 01 '22 at 09:55
  • 1
    @StavBodik Well, you need the CUDA-11 DLLs since that is the CUDA runtime. As for compilation: Section 1.4.1 in the compatibility guide you linked above shows the compile flags that you should use to incorporate the PTX for upward compatibility with CUDA-11 – Homer512 Dec 01 '22 at 10:35
  • 1
    @StavBodik The Nvidia drivers contain an inbuilt version of ptxas. – Sebastian Dec 01 '22 at 11:01
  • 3
    `NVCC Compilation type "Generate hybrid object file (--compile)" ` does not determine whether PTX will be generated or not. That is controlled by a separate switch (arch flags) for the compiler. If you are building your own CUDA device code using CUDA toolkit 10.1 and it is running correctly on an ampere GPU, then your binary certainly is using PTX. You can confirm the presence/existence of PTX in your executable using the `cuobjdump` tool. You may also wish to study one of the many questions on SO about the general CUDA compilation flow and the significance of the arch flags when compiling. – Robert Crovella Dec 01 '22 at 15:40
  • @Sebastian What do you mean please inbuilt ? it is not files that my compiler needs to generate according to my code in the .cu files ? – Stav Bodik Dec 04 '22 at 12:01
  • @RobertCrovella Yes I saw that when the C++ project compiled the arch flags are used : up to computability : version 7 -gencode=arch=compute_70,code=\"sm_70,compute_70\ But as I understand the ampere architecture computability is 8.6 , so I don't understand how it succeeding to run on ampere ? I guess as Homer512 write thanks to the driver ? "The CUDA runtime will then detect that your kernels are built for a version that is not compatible with Ampere. So it will take the PTX and compile a new version at runtime." new edit: " only a current driver is required on the ..." – Stav Bodik Dec 04 '22 at 12:04
  • 1
    @StavBodik PTX is upwards compatible and can be compiled into the platform-specific code on-the-fly. When you produce PTX of an older version, e.g. ```compute_50```, you cannot use new features that were added later but as long as the newer GPUs support all features present in older GPUs, those continue to work – Homer512 Dec 04 '22 at 12:43
  • 1
    @StavBodik And it makes sense that the driver can do this on its own without any extra executables or toolkits because the driver needs a fully fledged compiler for OpenGL and DirectX shader programs anyway. Adding PTX to that is no big deal – Homer512 Dec 04 '22 at 12:49
  • 1
    @StavBodik As also others have said, the Nvidia compiler contains a PTX compiler, even if no toolkit is installed. It compiles the kernels on the fly, when the program is executed, and there is no compiled version for the GPU included in your program. – Sebastian Dec 04 '22 at 14:42
  • Thanks everyone for help I really appreciate that , for future if some one will read this, I just want to note that everything started by me trying to understand why my old TensorFlow is not running on the new GPU and other projects are. So the answer is that both are able to run but TensorFlow PTX compiling time for the first time is very very long , this pages may help you to understand more : https://developer.nvidia.com/blog/cuda-pro-tip-understand-fat-binaries-jit-caching/ – Stav Bodik Dec 04 '22 at 16:16
  • https://stackoverflow.com/questions/64875304/tensorflow-2-2-taking-a-long-time-to-start – Stav Bodik Dec 04 '22 at 16:17
  • 2
    @StavBodik If you are doing TensorFlow, you might want to look into updating your toolkit to CUDA-11 anyway. While the PTX keeps everything working, it cannot use new features and a lot of the additions in Ampere cater towards machine learning https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html – Homer512 Dec 04 '22 at 17:36
  • @Homer512 Thanks, I know but I want to prevent upgrade (personal reasons) thats why all the thing was about. – Stav Bodik Dec 04 '22 at 18:55