Static device variable and kernels in separate file

Question

I want statically declare a global variable with __device__ qualifier. In the same time I want to store functions intended to GPU in a separate file.

However, if I do so, the variable value is not transferred to GPU -- there are no errors in compilation or execution time, but memcpy functions do nothing.

When I move kernel function into the file with the host code, everything works.

I am sure, that it should be possible to split host and device functions into separate files in this case, but how to do this? I have seen just examples, when kernels and host code are in the same file.

I would be also very thankful, if somebody explained, why does it behaves so.

A sample code is listed below.

Thank you in advance.

Working directory:

$ ls
functionsGPU.cu  functionsGPU.cuh  staticGlobalMemory.cu

staticGlobalMemory.cu:

#include "functionsGPU.cuh"

#if VARIANT == 2
__global__ void checkGlobalVariable(){
    printf("Old value (dev): %f\n", devData);
    devData += 2.0f;
    printf("New value (dev): %f\n", devData);
}
#endif

int main(int argc, char **argv){
    int dev = 0;
    float val = 3.2;

    cudaSetDevice(dev);

    printf("---------\nVARIANT %i\n---------\n", VARIANT);

    printf("Old value (host): %f\n", val);
    cudaMemcpyToSymbol(devData, &val, sizeof(float));
    checkGlobalVariable <<<1, 1>>> ();
    cudaMemcpyFromSymbol(&val, devData, sizeof(float));
    printf("New value (host): %f\n", val);

    cudaDeviceReset();

    return 0;
}

functionsGPU.cuh:

#ifndef FUNCTIONSGPU_CUH
#define FUNCTIONSGPU_CUH

#include <cuda_runtime.h>
#include <stdio.h>

#define VARIANT 1

__device__ float devData;

#if VARIANT == 1
__global__ void checkGlobalVariable();
#endif

#endif

functionsGPU.cu:

#include "functionsGPU.cuh"

#if VARIANT == 1
__global__ void checkGlobalVariable(){
    printf("Old value (dev): %f\n", devData);
    devData += 2.0f;
    printf("New value (dev): %f\n", devData);
}
#endif

This is compiled as

$ nvcc -arch=sm_61 staticGlobalMemory.cu functionsGPU.cu -o staticGlobalMemory

Output if the kernel and host code are in separate files (incorrect):

---------
VARIANT 1
---------
Old value (host): 3.200000
Old value (dev): 0.000000
New value (dev): 2.000000
New value (host): 3.200000

Output if the kernel and host code are in the same file (correct):

---------
VARIANT 2
---------
Old value (host): 3.200000
Old value (dev): 3.200000
New value (dev): 5.200000
New value (host): 5.200000

Robert Crovella · Accepted Answer · 2022-07-31T18:29:42.637

Your code structure, where device code in one compilation unit references device code or device entities in another compilation unit, will require CUDA relocatable device code compilation and linking.

In the case of __device__ variables such as what you have here:

Add -rdc=true to enable this, to your nvcc compilation command line
Add extern in front of the definition of devData, in functionsGPU.cuh
Add __device__ float devData; to staticGlobalMemory.cu

In the case of linking to a __device__ function in a separate file, along with providing the prototype typically via a header file like you would with any function in C++, you also need to add -rdc=true to your nvcc compilation command line, to enable device code linking. Steps 2 and 3 above are not needed.

That should fix the issue. Step 1 provides the necessary cross-module linkage, and steps 2 and 3 will fix the duplicate definition problem you would have, since you are including the same variable via a header file in separate compilation units.

For a reference of how to do the device code compilation setting in windows visual studio, see here.

Static __device__ variable and kernels in separate file

1 Answers1

Static device variable and kernels in separate file