The short answer is that you can't. CUDA only supports internal linkage, thus everything needed to compile a kernel must be defined within the same translation unit.
What you might be able to do is put the functions into a header file like this:
// Both functions in func.cuh
#pragma once
__device__ inline int add(int a, int b)
{
return a+b;
}
__device__ inline void fun1(int a, int b)
{
int c = add(a,b);
}
and include that header file into each .cu file you need to use the functions. The CUDA built chain seems to honour the inline
keyword and that sort of declaration won't generate duplicate symbols on any of the CUDA platforms I use (which doesn't include Windows). I am not sure whether it is intended to work or not, so cavaet emptor.