2

I've been reading in the CUDA Programming Guide about template functions and is something like this working?

#include <cstdio>

/* host struct */
template <typename T>
struct Test {
    T  *val;
    int size;
};

/* struct device */
template <typename T>
__device__ Test<T> *d_test;

/* test function */
template <typename T>
T __device__ testfunc() {
    return *d_test<T>->val;
}

/* test kernel */
__global__ void kernel() {
    printf("funcout = %g \n", testfunc<float>());
}

I get the correct result but a warning:

"warning: a host variable "d_test [with T=T]" cannot be directly read in a device function" ?

Has the struct in the testfunction to be instantiated with *d_test<float>->val ?

KR, Iggi

einpoklum
  • 118,144
  • 57
  • 340
  • 684

2 Answers2

3

Unfortunately, the CUDA compiler seems to generally have some issues with variable templates. If you look at the assembly, you'll see that everything works just fine. The compiler clearly does instantiate the variable template and allocates a corresponding device object.

.global .align 8 .u64 _Z6d_testIfE;

The generated code uses this object just like it's supposed to

ld.global.u64   %rd3, [_Z6d_testIfE];

I'd consider this warning a compiler bug. Note that I cannot reproduce the issue with CUDA 10 here, so this issue has most likely been fixed by now. Consider updating your compiler…

Michael Kenzel
  • 15,508
  • 2
  • 30
  • 39
  • I was able to reproduce the issue on CUDA 10.1 with Fedora 29, using `-std=c++14`, so I think it may still be present. Thanks to those who filed bugs. – Robert Crovella Mar 28 '19 at 20:41
  • Ok thanks! I'm using CentOS 7.6, CUDA 10.1 and the gcc (g++) 8.2.1 compiler from the devtoolset-8 with -std=c++14. – Herzog Igzorn Mar 29 '19 at 18:50
1

@MichaelKenzel is correct.

This is almost certainly an nvcc bug - which I have now filed (you might need an account to access that.

Also note I've been able to reproduce the issue with less code:

template <typename T>
struct foo { int  val; };

template <typename T>
__device__ foo<T> *x;

template <typename T>
int __device__ f() { return x<T>->val; }

__global__ void kernel() { int y = f<float>(); }

and have a look at the result on GodBolt as well.

einpoklum
  • 118,144
  • 57
  • 340
  • 684