Developing PTX instead of CUDA for optimization. Is it make sense?

Question

I'm developing cuda code. But new device languages which are PTX or SPIR backends was announced. And i can come across some application which is being developed by them. At least i think we can say ptx language is enough to develop something at product level.

As we know, PTX is not real device code. It is just intermediate language for NVidia. But my question is what if i develop PTX instead of CUDA? Can i develop naturally optimized code, if i use ptx? Is it make sense?

In the other hand why/what's the motivation of PTX language?

Thanks in advance

Roger Dahl · Answer 1 · 2014-03-04T23:49:18.493

Yes, it can make sense to implement CUDA code in PTX, just as it can make sense to implement regular CPU code in assembly instead of C++.

For instance, in CUDA C, there is no efficient way of capturing the carry flag and including it in new calculations. So it can be hard to implement efficient math operations that use more bits than what is supported natively by the machine (which is 32 bits on all current GPUs). With PTX, you can efficiently implement such operations.

I implemented a project in both CUDA C and PTX, and saw significant speedup in PTX. Of course, you will only see a speedup if your PTX code is better than the code created by the compiler from plain CUDA C.

I would recommend first creating a CUDA C version for reference. Then create a copy of the reference and start replacing parts of it with PTX, as determined by results from profiling, while making sure the results match that of the reference.

As far as the motivation for PTX, it provides an abstraction that lets NVIDIA change the native machine language between generations of GPUs without breaking backwards compatibility.

Thank you for your answer Roger. But how did you start to develop PTX? Because there is only one paper regarding PTX. Do you have any example? — grypp, Mar 05 '14 at 13:27

score 3 · Answer 2 · answered Mar 08 '14 at 22:56

The main advantage of developing in PTX is that it can give you access to certain features which are not exposed directly in CUDA C. For instance, certain cache modifiers on load instructions, some packed SIMD operations, and predicates.

That said, I wouldn't advise anyone to code in PTX. On the CUDA Library team, we sometimes wrap PTX routines in a C function via inline assembly, and then use that. But programming in C/C++/Fortan is way easier than writing PTX.

Also, the run-time will re-compiled your PTX into an internal hardware-specific assembly language. In the process it may reorder instructions, assign registers, and change scheduling. So all of your careful ordering in PTX is mostly unnecessary and usually has little to do with the final assembly code. NVIDIA now ships at disassembler which lets you view the actual internal assembly - you can compare for yourself if you want to play around with it.

Thank you Mr Jonathan. Actually i am working on code generation. More or less my code structure is same such as iterations, reductions. If we consider them, Should i generate PTX code instead of CUDA-C? Beside How can i start to work PTX? Is there any examples on the internet? I could just find 1 or 2 examples on cuda samples. But these are not enough to understand PTX advantage. — grypp, Mar 10 '14 at 12:45

Developing PTX instead of CUDA for optimization. Is it make sense?

2 Answers2