How to properly link cuda header file with device functions?

Question

I'm trying to decouple my code a bit and something fails. Compilation error:

error: calling a __host__ function("DecoupledCallGpu") from a __global__ function("kernel") is not allowed

Code excerpt:

main.c (has a call to cuda host function):

#include "cuda_compuations.h"
...
ComputeSomething(&var1,&var2);
...

cuda_computations.cu (has kernel, host master functions and includes header which has device unctions):

#include "cuda_computations.h"
#include "decoupled_functions.cuh"
...
__global__ void kernel(){
...
DecoupledCallGpu(&var_kernel);
}

void ComputeSomething(int *var1, int *var2){
//allocate memory and etc..
...
kernel<<<20,512>>>();
//cleanup
...
}

decoupled_functions.cuh:

#ifndef _DECOUPLEDFUNCTIONS_H_
#define _DECOUPLEDFUNCTIONS_H_

void DecoupledCallGpu(int *var);

#endif

decoupled_functions.cu:

#include "decoupled_functions.cuh"

__device__ void DecoupledCallGpu(int *var){
  *var=0;
}

#endif

Compilation:

nvcc -g --ptxas-options=-v -arch=sm_30 -c cuda_computations.cu -o cuda_computations.o -lcudart

Question: why is it that the DecoupledCallGpu is called from host function and not a kernel as it was supposed to?

P.S.: I can share the actual code behind it if you need me to.

Well, in all those code snippets you've showwn niether "ComputeDensityGpu" nor "DoColision", which were the actual functions listed in the error message. So you leave us guessing. But it looks to me like your `DecoupledCallGpu` prototype in `decoupled_functions.cuh` is missing the `__device__` decorator. And separating the compilation of a device function from the compilation unit where it is invoked will probably mean that you have to use [separate compilation and linking](http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#using-separate-compilation-in-cuda). — Robert Crovella, Jun 27 '14 at 19:57

Robert Crovella · Accepted Answer · 2014-06-28T00:56:36.880

Add the __device__ decorator to the prototype in decoupled_functions.cuh. That should take care of the error message you are seeing.

Then you'll need to use separate compilation and linking amongst your modules. So instead of compiling with -c compile with -dc. And your link command will need to be modified. A basic example is here.

Your question is a bit confusing:

Question: why is it that the DecoupledCallGpu is called from host function and not a kernel as it was supposed to?

I can't tell if you're tripping over english or if there is a misunderstanding here. The actual error message states:

error: calling a __host__ function("DecoupledCallGpu") from a __global__ function("kernel") is not allowed

This is arising due to the fact that within the compilation unit (ie. within the module, within the file that is being compiled, ie. cuda_computations.cu), the only description of the function DecoupledCallGpu() is that which is provided in the prototype in the header:

void DecoupledCallGpu(int *var);

This prototype indicates an undecorated function in CUDA C, and such functions are equivalent to __host__ (only) decorated functions:

__host__ void DecoupledCallGpu(int *var);

That compilation unit has no knowledge of what is actually in decoupled_functions.cu.

Therefore, when you have kernel code like this:

__global__ void kernel(){       //<- __global__ function
...
DecoupledCallGpu(&var_kernel);  //<- appears as a __host__ function to compiler
}

the compiler thinks you are trying to call a __host__ function from a __global__ function, which is illegal.

For Visual Studio users, this option translates to changing "Generate relocatable device code" to "Yes" in CUDA C++ options tab. See: https://stackoverflow.com/a/45258292/6734314 — astrowalker, Oct 05 '18 at 07:16

How to properly link cuda header file with device functions?

1 Answers1

Linked