CUDA started out (over a decade ago) as a largely C style entity. Over time, the language migrated to be primarily a C++ variant/definition. For understanding, we should delineate the discussion between device code and host code.
For device code, CUDA claims compliance to a particular C++ standard, subject to various restrictions. One of the particular restrictions is that there is no general support for standard libraries.
For device code, (with some overlap with host code) there is an evolution underway to provide a set of STL-like libraries/features. But as an example, std::vector
is not usable in CUDA device code (you can use new
in CUDA device code).
For host code, there really isn't anything that is intended to be out-of-bounds, as long as we are talking about things that are strictly host code. The exceptions to this are undocumented issues that crop up from time to time for example with boost
and perhaps many other libraries. These aren't intentional omissions, but arise via the fact that CUDA uses a special preprocessor/front-end, even for host code, coupled with incomplete testing against every imaginable library one might want to use.
It might also be worthwhile to say regarding user-supplied libraries (as opposed to standard libraries or system libraries) that CUDA generally requires functions to be decorated appropriately in order to be usable in device code. Whether we are talking about compiled libraries or header-only libraries, these should generally be usable in host code (subject to the caveat above), but not necessarily in device code, unless the library has been specifically decorated for CUDA usage.
Where host code is interfacing with device code, you'll need to follow the limitations fairly closely. Again, a std::vector
container cannot be easily passed to a device code function call (a CUDA kernel). But as already mentioned in the comments, there is something similar you can do with the thrust library which is included with the CUDA toolkit install.
Are these all necessary?
malloc
and free
are not necessary. You can similarly use new
and delete
, or use the thrust containers.
regarding use of raw pointers and relatedly, C-style arrays, this will probably be more-or-less unavoidable, as these are part of C++ and there are no higher level containers in C++ apart from what is in standard libraries, AFAIK. Use of raw pointers at least at the host-device interface is certainly typical. If you use thrust::device_vector
, for example, you will still need to extract a raw pointer to pass to the kernel.
The CUDA runtime and driver APIs still have largely a C-style feel to them. It's not formally part of CUDA, but others have created wrappers to make things more "C++ like". One such example is this library from einpoklum/eyalroz. I have no personal experience with it, but the maintenance of it seems to be relatively energetic, a going concern. And as hinted in the comments, via C++ overloads and e.g. replaceable functionality in various containers and library constructs, you can probably build a container or construct that does what you want, perhaps by replacing standard allocators, etc.
As already mentioned, thrust intends to provide a container/algorithm approach to leverage those kinds of C++ concepts in a CUDA environment.
It's not part of CUDA, but NVIDIA offers a way to accelerate standard C++ code also.