How can I compile a CUDA program for sm_1X AND sm_2X when I have a surface declaration

Question

I am writing a library that uses a surface (to re-sample and write to a texture) for a performance gain:

...
surface<void,  2> my_surf2D; //allows writing to a texture
...

The target platform GPU has compute capability 2.0 and I can compile my code with:

nvcc -arch=sm_20 ...

and it works just fine.

The problem is when I am trying to develop and debug the library on my laptop which has an NVIDIA ION GPU with compute capability 1.1 (I would also like my library to be backwards compatible). I know this architecture does not support surfaces so I used the nvcc macros in my device code to define an alternate code path for this older architecture:

#if (__CUDA_ARCH__ < 200)
#warning using kernel for CUDA ARCH < 2.0
...
temp_array[...] =  tex3D(my_tex,X,Y,Z+0.5f);
#else
...
surf2Dwrite( tex3D(my_tex,X,Y,Z+0.5f), my_surf2D, ix*4, iy,cudaBoundaryModeTrap);
#endif

The problem is that when I do:

nvcc -gencode arch=compute_11,code=sm_11

I get this error:

ptxas PTX/myLibrary.ptx, line 1784; fatal  : Parsing error near '.surf': syntax error

When I look at the PTX file is see what appears to be the surface declaration:

.surf .u32 _ZN16LIB_15my_surf2DE;

If I try to put a similar macro around the surface declaration in my source code:

#ifdef __CUDACC__
#if __CUDA_ARCH__ < 200
#warning skipping surface declaration for nvcc trajectory
#else
surface ...
#endif
#else
#warning keeping surface declaration by default
surface ...
#endif

I get an error saying the surface variable is undefined in the host code call to to bind cuda surface to array. Should I add the macro around the bind function as well?

I'm not sure if it is possible, or if I goofed somewhere, please help.

Are you using preprocessor "protection" around the declaration of the surfaces as well as the access calls inside device code? — talonmies, Apr 15 '12 at 14:39
What you're trying to do sounds like it should work just fine. Which variable is undefined in your error? — Roger Dahl, Apr 15 '12 at 14:44
I will update my post with more details about how I tried to wrap surface declaration. — FizxMike, Apr 15 '12 at 20:13

score 3 · Accepted Answer · edited May 23 '17 at 12:08

Figured this thread should show up as answered...

I got it to work (quite simple actually). You must put a macro around all three possible places where the surface reference is used, and be careful to use the macros properly (it turns out, __CUDACC__ is not necessary).

The following only changes the code when compiling for compute capability < 2.0

The surface declaration:

//enable backwards compatability:
#if defined(__CUDA_ARCH__) & (__CUDA_ARCH__ < 200)
#warning skipping surface declarations for compute capability < 2.0
#else
surface<void,  2> my_surf2D; //allows writing to a texture
#endif

Surface binding:

#if defined(__CUDA_ARCH__) & (__CUDA_ARCH__ < 200)
#warning skipping cudaBindSurfaceToArray for compute capability < 2.0
...
#else
errorCode = cudaBindSurfaceToArray(my_surf2D, my_cudaArray2D);
#endif

And Surface writing:

#if defined(__CUDA_ARCH__) & (__CUDA_ARCH__ < 200)
#warning using kernel for compute capability < 2.0
...
temp_array[...] =  tex3D(my_tex,X,Y,Z+0.5f);
#else
...
surf2Dwrite( tex3D(my_tex,X,Y,Z+0.5f), my_surf2D, ix*4, iy,cudaBoundaryModeTrap);
#endif

This works for both virtual and real targets (-arch=compute_XX and -arch=sm_XX respectively).

Thanks to talonmies and Roger Dahl for pointing me in the right direction, as well as this answer from talonmies which has a great explanation of nvcc/CUDA macros as well.

What is weird to me is that both the surface definition and the surface binding is really host code, and in theory `__CUDA_ARCH__` is not defined in host code. But if it works, it works. — Auron, Jun 09 '14 at 07:48

How can I compile a CUDA program for sm_1X AND sm_2X when I have a surface declaration

1 Answers1

Linked