In one application, I've got a bunch of CUDA kernels. Some use dynamic parallelism and some don't. For the purposes of either providing a fallback option if this is not supported, or simply allowing the application to continue but with reduced/partially available features, how can I go about compiling?
At the moment I'm getting invalid device function
when running kernels compiled with -arch=sm_35
on a 670 (max sm_30
) that don't require compute 3.5.
AFAIK you can't use multiple -arch=sm_*
arguments and using multiple -gencode=*
doesn't help. Also for separable compilation I've had to create an additional object file using -dlink
, but this doesn't get created when using compute 3.0 (nvlink fatal : no candidate found in fatbinary
due to -lcudadevrt
, which I've needed for 3.5), how should I deal with this?