No, it won't exploit any features you need to explicitly program for.
Only those features that are transparent to the user (like cache or larger register files) will be used.
Additionally, you need to make sure your object file contains a version of the code compiled to the PTX intermediate language, that can be dynamically compiled to the target architecture, or you program will not even run.
Compile to a virtual architecture (nvcc -arch compute_13
) to ensure that, or create a fat binary with code for multiple architectures using the -gencode
option to nvcc.
With a fat binary, you can program for features available only on higher compute capability if you wrap the code inside #if __CUDA_ARCH__ >= xyz
preprocessor conditionals.