Completely disable optimizations on NVCC

Question

I'm trying to measure peak single-precision flops on my GPU, for that I'm modifying a PTX file to perform successive MAD instructions on registers. Unfortunately the compiler is removing all the code because it actually does nothing usefull since I do not perform any load/store of the data. Is there a compiler flag or pragma to add to the code so the compiler does not touch it?

Thanks.

Would inline PTX possibly work? I would think that the compiler would have to include your code in that case, although I've never tried this myself. — sj755, Aug 06 '12 at 05:59
@sj755: The assembler is probably the cause of the problem here and inline PTX doesn't help in that case. — talonmies, Aug 06 '12 at 06:26

score 7 · Answer 1 · answered Aug 06 '12 at 16:17

To completely disable optimizations with nvcc, you can use the following:

nvcc -O0 -Xopencc -O0 -Xptxas -O0  // sm_1x targets using Open64 frontend
nvcc -O0 -Xcicc -O0 -Xptxas -O0 // sm_2x and sm_3x targets using NVVM frontend

Note that the resulting code may be extremely slow. The -O0 flag is passed to the host compiler to disable host code optimization. The -Xopencc -O0 and -Xcicc -O0 flags control the compiler frontend (the part that produces PTX) and turn off optimizations there. The -Xptxas -O0 flag controls the compiler backend (the part that converts PTX to machine code) and turns off optimizations in that part. Note that -Xopencc, -Xcicc, and -Xptxas flags are component-level flags, and unless documented in the nvcc manual, should be considered unsupported.

It does make the code slower on my GPU too, but that shows that the flags work. Even the generated PTX has unoptimized code. Works wonders, thank you! — Kajal, May 29 '16 at 12:10

score 3 · Accepted Answer · answered Aug 06 '12 at 04:40

3

I don't think there is any way to turn off such optimization in the compiler. You can work around this by adding code to store your values and wrapping that code in a conditional statement that is always false. To make a conditional that the compiler can't determine to always be false, use at least one variable (not just constants).

answered Aug 06 '12 at 04:40

Roger Dahl

15,132
8
62
82

This is the canonical way to do it. If the dummy flag that protects the write is put into constant memory, you get constant cache + broadcast which has very little impact on overall performance as long as there are enough FLOPs/IOPs in the compute phase of the kernel. – talonmies Aug 06 '12 at 06:28

score 1 · Answer 3 · answered Aug 06 '12 at 08:47

(I am still in CUDA 4.0, it may have changed with the new version)

To disable optimizations of ptxas (the tool that converts ptx into cubin) you need to pass an option --opt-level 0 (default is --opt-level 3). If you want to pass this option through nvcc you will need to prefix it with --ptxas-options.

Do note however, that ptxas does a lot of useful optimizations that --- when disabled --- may render your code even slower if not incorrect at all! For example, it does register allocation and tries to predict where is shared and where is global memory.

score 0 · Answer 4 · answered Jun 27 '15 at 03:10

0

These worked for me:

-g -G -Xcompiler -O0 -Xptxas -O0 -lineinfo -O0

answered Jun 27 '15 at 03:10

Andrei Pokrovsky

3,590
3
26
17

2

These are the flags for which command? Also please exaplain what they do and why they solve the OP's problem... – Marki555 Jun 28 '15 at 20:43

score -2 · Answer 5 · answered Aug 06 '12 at 04:29

-2

As far as I know, there is no compiler flag or pragma for that. but you can compute more and store less

answered Aug 06 '12 at 04:29

yyfn

737
4
4

Completely disable optimizations on NVCC

5 Answers5

Linked