4

I'm trying to measure peak single-precision flops on my GPU, for that I'm modifying a PTX file to perform successive MAD instructions on registers. Unfortunately the compiler is removing all the code because it actually does nothing usefull since I do not perform any load/store of the data. Is there a compiler flag or pragma to add to the code so the compiler does not touch it?

Thanks.

Caian
  • 440
  • 4
  • 15
  • Would inline PTX possibly work? I would think that the compiler would have to include your code in that case, although I've never tried this myself. – sj755 Aug 06 '12 at 05:59
  • @sj755: The assembler is probably the cause of the problem here and inline PTX doesn't help in that case. – talonmies Aug 06 '12 at 06:26

5 Answers5

7

To completely disable optimizations with nvcc, you can use the following:

nvcc -O0 -Xopencc -O0 -Xptxas -O0  // sm_1x targets using Open64 frontend
nvcc -O0 -Xcicc -O0 -Xptxas -O0 // sm_2x and sm_3x targets using NVVM frontend

Note that the resulting code may be extremely slow. The -O0 flag is passed to the host compiler to disable host code optimization. The -Xopencc -O0 and -Xcicc -O0 flags control the compiler frontend (the part that produces PTX) and turn off optimizations there. The -Xptxas -O0 flag controls the compiler backend (the part that converts PTX to machine code) and turns off optimizations in that part. Note that -Xopencc, -Xcicc, and -Xptxas flags are component-level flags, and unless documented in the nvcc manual, should be considered unsupported.

njuffa
  • 23,970
  • 4
  • 78
  • 130
  • It does make the code slower on my GPU too, but that shows that the flags work. Even the generated PTX has unoptimized code. Works wonders, thank you! – Kajal May 29 '16 at 12:10
3

I don't think there is any way to turn off such optimization in the compiler. You can work around this by adding code to store your values and wrapping that code in a conditional statement that is always false. To make a conditional that the compiler can't determine to always be false, use at least one variable (not just constants).

Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
  • This is the canonical way to do it. If the dummy flag that protects the write is put into constant memory, you get constant cache + broadcast which has very little impact on overall performance as long as there are enough FLOPs/IOPs in the compute phase of the kernel. – talonmies Aug 06 '12 at 06:28
1

(I am still in CUDA 4.0, it may have changed with the new version)

To disable optimizations of ptxas (the tool that converts ptx into cubin) you need to pass an option --opt-level 0 (default is --opt-level 3). If you want to pass this option through nvcc you will need to prefix it with --ptxas-options.

Do note however, that ptxas does a lot of useful optimizations that --- when disabled --- may render your code even slower if not incorrect at all! For example, it does register allocation and tries to predict where is shared and where is global memory.

CygnusX1
  • 20,968
  • 5
  • 65
  • 109
0

These worked for me:

-g -G -Xcompiler -O0 -Xptxas -O0 -lineinfo -O0

Andrei Pokrovsky
  • 3,590
  • 3
  • 26
  • 17
  • 2
    These are the flags for which command? Also please exaplain what they do and why they solve the OP's problem... – Marki555 Jun 28 '15 at 20:43
-2

As far as I know, there is no compiler flag or pragma for that. but you can compute more and store less

yyfn
  • 737
  • 4
  • 4