Questions tagged [cuda-driver]

A lower-level C-language API for managing computational work in the CUDA platform on NVIDIA GPU hardware.

This tag refers to the CUDA Driver API. It is a lower-level alternative to the much more common CUDA Runtime API. Both are part of the CUDA platform and offer different levels of abstraction when programming general-purpose GPU applications.

The Driver API resembles much of the OpenCL programming style. Unlike the Runtime API, it does not require the use of the nvcc compiler and offers the possibility of runtime compilation by means of the NVRTC library.

Members of the CUDA Driver API are prefixed with cu, while members of the Runtime API are prefixed with cuda. E.g.: cudaGetErrorName (Runtime API) vs cuGetErrorName (Driver API).

NVIDIA's documentation on the difference between the driver and runtime APIs.

Questions about CUDA Driver API can be asked here on Stack Overflow, but if you have bugs to report you should discuss them on the CUDA forums or report them via the registered developer portal. You may want to cross-link to any discussion here on SO.

46 questions
1
vote
0 answers

What are the new unique-id's for CUDA streams and contexts useful for?

CUDA 12 introduces two new API calls, cuStreamGetId() and cuCtxGetId() which return "unique ID"s associated with a stream or a context respectively. I'm struggling to understand why this is useful, or how this would be used. Are the handles for…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
0 answers

Can userspace code leverage NVIDIA's open-sourcing of their kernel modules?

NVIDIA has recently announced they are open-sourcing (a variant of) their GPU Linux kernel driver. They are not, however, open-sourcing the user-mode driver libraries (e.g. libcuda.so). It's a gradual process and not all GPUs are supported…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
0 answers

How can I interact with NVIDIA's JIT compilation cache?

(Following Is NVIDIA's JIT compilation cache used when you don't use NVCC?) NVIDIA's JIT compilation cache (which we find in ~/.nv/CompilationCache on Linux systems) has a somewhat opaque structure, with a non-textual index. I would like to be able…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
2 answers

Is NVIDIA's JIT compilation cache used when you don't use NVCC?

As we should all know (but not enough people do), when you build a CUDA program with NVCC, and run it on a device for which fully-compiled (SASS) code for the specific device is not included in the binary - the intermediate PTX code is JITed, and…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
0 answers

How do I copy 2D CUDA arrays/textures between contexts?

Suppose I want to copy some memory between different CUDA contexts (possibly on different devices). The CUDA Driver API offers me: cuMemcpyPeer - for plain old device global memory cuMemcpy3DPeer - for 3D arrays/textures But there doesn't seem to…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

Missing symbol: cuDevicePrimaryCtxRelease vs cuDevicePrimaryCtxRelease_v2

I'm trying to build the following program: #include #include int main() { const char* str; auto status = cuInit(0); cuGetErrorString(status, &str); std::cout << "status = " << str << std::endl; int…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
2 answers

Can I obtain what's used as __nv_nvrtc_builtin_header.h?

I'm profiling a kernel compiled (with debug and lineinfo) using the nvrtc library. In the profiling results, many of the samples are listed as being within __nv_nvrtc_builtin_header.h. However - there is obviously no such file on disk, and naturally…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

What are the "remote writes" which you can await with CU_STREAM_WAIT_VALUE_FLUSH?

When you perform a wait-on-value operation using the CUDA driver API call cuStreamWaitValue32(), you can specify the flag CU_STREAM_WAIT_VALUE_FLUSH. Here's what the documentation says it does: Follow the wait operation with a flush of outstanding…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
3 answers

What makes cuLaunchKernel fail with CUDA_ERROR_INVALID_HANDLE?

I'm launching a CUDA kernel I've compiled, using the cudLaunchKernel() driver API function. I'm passing my parameters in a kernelParams array, and passing nullptr for the extra argument. Unfortunately, this fails, with the error:…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
1 answer

What should I link against: The actual CUDA driver library or the driver library stub?

A CUDA distribution, at least on Linux, has a "stub libraries" directory, which contains among others a libcuda.so file - named the same as an actual NVIDIA driver library. When build a CUDA program which makes driver API calls, on a system with…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
0 answers

How can I check whether CUDA device peer access is enabled (rather than supported)?

If you have a pair of devices for which cuDeviceCanAccessPeer() is true, and you try If you try to disable peer access cuCtxDisablePeerAccess() - you may get a failure, CUDA_ERROR_PEER_ACCESS_NOT_ENABLED. So, there's ability to access, and there's…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
0 answers

Why does cuMemGetAccess() take an unsigned long long * rather than CUmemAccess_flags*

The CUDA driver API call CUresult CUDAAPI cuMemGetAccess( unsigned long long * flags, const CUmemLocation * location, CUdeviceptr ptr); takes a pointer to a builtin C(++) language type rather than any type definition. Yet…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
0 answers

Are pools and pool-allocated memory CUDA-context-specific?

Typical CUDA memory allocations - e.g. using cuMemAlloc() are specific to the current CUDA (driver) context. Is this also true for memory pools? Perhaps for allocations from pools? The driver API for memory pools explicitly mentions devices, but not…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
1 answer

What does CU_MEMPOOL_ATTR_REUSE_ALLOW_OPPORTUNISTIC actually allow?

One of the attributes of CUDA memory pools is CU_MEMPOOL_ATTR_REUSE_ALLOW_OPPORTUNISTIC, described in the doxygen as follows: Allow reuse of already completed frees when there is no dependency between the free and allocation. If a free (a…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
0 answers

Do CUDA 3D memory copy parameters need to be kept alive?

Consider the CUDA API function CUresult cuMemcpy3DAsync (const CUDA_MEMCPY3D* pCopy, CUstream hStream); described here. It takes a CUDA_MEMCPY3D structure by pointer ; and this pointer is not to some CUDA-driver-created entity - it's to a structure…
einpoklum
  • 118,144
  • 57
  • 340
  • 684