0

I want to use the SIMD video instructions (vadd4, vmax4 etc.) Section 8.7.13 in http://docs.nvidia.com/cuda/pdf/ptx_isa_3.1.pdf

I tried the following in my code

asm("vadd4.u32.u32.u32 %0, %1, %2, %3;" : "=r"(i) : "r"(j) : "r"(k) : "r"(l));

where i,j,k,l are int variables. I used "r", as it is the constraint for .u32 reg

But on compiling, I get the following error

error: unknown register name "r"

What should I use instead of "r" here? Or is there something else wrong in the code? (I am using a Tesla card with compute capability 3.5)

1 Answers1

3

I believe you have a slight syntax error. Try this:

asm("vadd4.u32.u32.u32 %0, %1, %2, %3;" : "=r"(i) : "r"(j) , "r"(k) , "r"(l));

                                                           ^        ^
                                                           |        |
Note the replacement of two of your colons (:) with commas (,)

You may wish to refer to the following document:

/usr/local/cuda/doc/pdf/Using_Inline_PTX_Assembly_In_CUDA.pdf

(assuming a standard cuda 5 linux install; just use your file search function if on a windows machine)

On page 4 of that document it states:

...you can have multiple input or output operands separated by commas.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • You can find examples for the use of the SIMD video instructions via inline PTX in a header file of wrapper functions that NVIDIA makes available to registered developers: https://devtalk.nvidia.com/default/topic/535684/announcements/release-1-1-of-simd-in-a-word-functions-posted/ – njuffa Jun 25 '13 at 08:36