Are CUDA streams device-associated? And how do I get a stream's device?

Question

I have a CUDA stream which someone handed to me - a cudaStream_t value. The CUDA Runtime API does not seem to indicate how I can obtain the index of the device with which this stream is associated.

Now, I know that cudaStream_t is just a pointer to a driver-level stream structure, but I'm hesitant to delve into the driver too much. Is there an idiomatic way to do this? Or some good reason not to want to do it?

Edit: Another aspect to this question is whether the stream really is associated with a device in a way in which the CUDA driver itself can determine that device's identity given the pointed-to structure.

I am not a CUDA driver expert and I have no practical experience with multi-GPU programming. But I would expect each CUDA stream to be specific to a particular CUDA context. I would also expect each GPU to have its own CUDA context. That would mean a CUDA stream handle is not unique across devices, just unique for each given device. So you may need to pass a pair {device number, stream handle} in the app. — njuffa, Jul 17 '15 at 18:41
@njuffa: If a CUDA stream is context-specific, and if a context is device-specific, doesn't that mean a CUDA stream handle _is_ unique across devices? Perhaps I'm not following you... — einpoklum, Jul 17 '15 at 18:46
A unique stream handle across all devices implies a global "namespace" for stream handles. I do not think that exists. So if you assume two GPUs, each with its own context. The first stream created in each context may get the handle value 1. Somebody passes a stream handle with value 1 to your code. Which device does it belong to? We can't tell. — njuffa, Jul 17 '15 at 18:49
@njuffa: `cudaStream_t`s are pointers to structures, not integer handles (like the CUDA device identifiers). What makes you believe these are non-unique handles? — einpoklum, Jul 17 '15 at 19:00
@RobertCrovella: But it might be the case that even the CUDA driver can't determine which device the stream is associated with, and launches with other devices would fail ungracefully. I mean, it sounds unlikely, but this is really the crux of the question, since if we knew where the CUDA driver can look at, we could look there ourselves. — einpoklum, Jul 25 '15 at 08:12

einpoklum · Accepted Answer · 2022-07-29T17:25:58.113

Yes, streams are device-specific.

In CUDA, streams are specific to a context, and contexts are specific to a device.

Now, with the runtime API, you don't "see" contexts - you use just one context per device. But if you consider the driver API - you have:

CUresult cuStreamGetCtx ( CUstream hStream, CUcontext* pctx );

CUstream and cudaStream_t are the same thing - a pointer. So, you can get the context. Then, you set or push that context to be the current context (read about doing that elsewhere), and finally, you use:

CUresult cuCtxGetDevice ( CUdevice* device )

to get the current context's device.

So, a bit of a hassle, but quite doable.

My approach to easily determining a stream's device

My workaround for this issue is to have the (C++'ish) stream wrapper class keep (the context and) the device among the member variables, which means that you can write:

auto my_device = cuda::device::get(1);
auto my_stream = my_device.create_stream(); /* using some default param values here */
assert(my_stream.device() == my_device());

and not have to worry about it (+ it won't trigger the extra API calls since, at construction, we know what the current context is and what its device is).

_{Note: The above snippet is for a system with at least two CUDA devices, otherwise there is no device with index 1...}

Just starting with CUDA, and wow, some of the legacy aspects of the API are painful. You would totally expect a stream to be able to tell you what device it is from, but it's just not there. And there's the NppStreamContext, which is what you have to use for streams with NPP, but requires manually initializing like 8 struct fields. Including the device. When I'm holding a stream. Thanks for starting and continuing to work on the library that I would expect NVIDIA to provide :-) . — aggieNick02, Aug 06 '20 at 22:01
Haha, I don't know - is NPP not popular? What do you use if you want to do common manipulations on the GPU (like adding a channel to an image or resizing an image) but aren't comfortable writing your own kernels yet? Unfortunately I don't know anybody at nVidia, but I'll still advocate if any opportunities arise. — aggieNick02, Aug 06 '20 at 22:45
@aggieNick02: Ah, ok, you're doing work on images. Never mind my comment, I was mistaking NPP for something else. — einpoklum, Aug 06 '20 at 23:21

score 0 · Answer 2 · answered Jul 17 '15 at 15:14

0

Regarding to the explicit streams, it is up to the implementation (to the best of my knowledge) there is no API providing this potential query capability to the users; I don't know about the capabilities that the driver can provide for you in this front, however, you can always query the stream.

By using cudaStreamQuery, you can query your targeted stream on your selected device, if it returns cudaSuccess or cudaErrorNotReady it means that the stream does exist on that device and if it returns cudaErrorInvalidResourceHandle, it means that it does not.

answered Jul 17 '15 at 15:14

Iman

188
9

Am I guaranteed to get cudaErrorNotReady for a stream from device n when querying it with device n' selected? – einpoklum Jul 17 '15 at 18:49
One of the cudaStreamQuery or cudaErrorNotReady will be returned for sure for the stream that is being queried if that happens to exist for your selected device. The former shows that all operations to the associated stream is complete while the latter shows that it does not; Anyway, either of the two shows that the stream exists. If you get cudaErrorInvalidResourceHandle on a stream it means that the stream doesn't exist at all. – Iman Jul 17 '15 at 19:10
But can the stream "exist" on two device? To be more concrete, suppose that, internally, a stream value is just an index into its device's array of streams. So, stream 0 will always exist on all devices. Will I get a cudaSucces from all GPUs with that cudaStream_t? – einpoklum Jan 22 '16 at 17:57
ping... about my last comment. – einpoklum Jul 30 '16 at 13:50

Are CUDA streams device-associated? And how do I get a stream's device?

2 Answers2

Yes, streams are device-specific.

My approach to easily determining a stream's device