42

I've searched all over for some insight on how exactly to use classes with CUDA, and while there is a general consensus that it can be done and apparently is being done by people, I've had a hard time finding out how to actually do it.

I have a class which implements a basic bitset with operator overloading and the like. I need to be able to instantiate objects of this class on both the host and the device, copy between the two, etc. Do I define this class in a .cu? If so, how do I use it in my host-side C++ code? The functions of the class do not need to access special CUDA variables like threadId; it just needs to be able to be used host and device side.

Thanks for any help, and if I'm approaching this in completely the wrong way, I'd love to hear alternatives.

secshunayt
  • 423
  • 1
  • 5
  • 4

2 Answers2

62

Define the class in a header that you #include, just like in C++.

Any method that must be called from device code should be defined with both __device__ and __host__ declspecs, including the constructor and destructor if you plan to use new/delete on the device (note new/delete require CUDA 4.0 and a compute capability 2.0 or higher GPU).

You probably want to define a macro like

#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER __host__ __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif 

Then use this macro on your member functions

class Foo {
public:
    CUDA_CALLABLE_MEMBER Foo() {}
    CUDA_CALLABLE_MEMBER ~Foo() {}
    CUDA_CALLABLE_MEMBER void aMethod() {}
};

The reason for this is that only the CUDA compiler knows __device__ and __host__ -- your host C++ compiler will raise an error.

Edit: Note __CUDACC__ is defined by NVCC when it is compiling CUDA files. This can be either when compiling a .cu file with NVCC or when compiling any file with the command line option -x cu.

harrism
  • 26,505
  • 2
  • 57
  • 88
  • 1
    Ahh - so when it is included from a cpp, it is compiled without the __device__ specifiers, and when it is included from a .cu, with. I guess my first fear was that this would cause some sort of disparity when passing from the host to device, but I see now how it works. Thanks a lot! – secshunayt Aug 08 '11 at 07:45
  • 16
    Just to add a small caveat to Mark's answer that catches a lot of people out - CUDA doesn't support external linkage, so the class and all its methods must be fully defined at compilation unit scope when compiling the class for the GPU. – talonmies Aug 08 '11 at 08:44
  • 1
    Thanks @talonmies, that's correct. A device code linker is a feature for the future. – harrism Aug 09 '11 at 01:10
  • 2
    Wow , I didn't expect to find @harrism here , I saw your picture on the cuda website, I didn't expect you answering simple questions. – TripleS May 30 '12 at 15:09
  • 5
    To @talonmies' comment, note that CUDA 5 (now in preview) adds linking for device code. – harrism Jun 27 '12 at 12:43
  • @harrism: I wonder now, how would you deal with functions only callable from the device? Just setting them `__device__` and `private`? – datenwolf Jan 10 '13 at 18:26
  • They don't have to be private. Just make sure you don't call them from non-device code. device code can call *public* or *private* `__device__` methods. – harrism Jan 10 '13 at 23:21
  • @harrism: I've not had any success building a simple class that encapsulates a CUDA kernel call and contains a `__device__` member variable. Unfortunately the SDK doesn't contain said example either. Can you suggest how to call a CUDA kernel within your example? Mine compiles fine (shown as a *reply* below), but doesn't work (fails without any errors). The same code without the class encapsulating it works ok. I've experimented with `__device__ __host__` and am compiling with compute_35 and sm_35 and running on a GeForce 580. Any ideas why it doesn't work? – axon Apr 02 '13 at 07:00
  • I don't see a "reply" below -- what do you mean? I think you should post this as a new SO question. – harrism Apr 03 '13 at 04:37
  • Hi @harrism , is it possible to have both host function and device function as methods in the same class? – Charles Chow Mar 19 '15 at 04:22
  • Yes, you can do that. – harrism Mar 21 '15 at 05:37
  • @secshunayt the comment given by you has been beneficial as a simple explanation of this answer. @harrism Very nice answer.Can you please update your answer to explain `__CUDACC__` will be defined when ? – Sayan Bhattacharjee Mar 08 '16 at 17:41
  • @harrism Now I know that we can use -x cu as a compilation flag . Great info. – Sayan Bhattacharjee Mar 11 '16 at 04:29
3

Another good resource for this question are some of the code examples that come with the CUDA toolkit. Within these code samples you can find examples of just about any thing you could imagine. One that is pertinent to your question is the quadtree.cu file. Best of luck.

t. fochtman
  • 431
  • 3
  • 9