Yes, streams are device-specific.
In CUDA, streams are specific to a context, and contexts are specific to a device.
Now, with the runtime API, you don't "see" contexts - you use just one context per device. But if you consider the driver API - you have:
CUresult cuStreamGetCtx ( CUstream hStream, CUcontext* pctx );
CUstream
and cudaStream_t
are the same thing - a pointer. So, you can get the context. Then, you set or push that context to be the current context (read about doing that elsewhere), and finally, you use:
CUresult cuCtxGetDevice ( CUdevice* device )
to get the current context's device.
So, a bit of a hassle, but quite doable.
My approach to easily determining a stream's device
My workaround for this issue is to have the (C++'ish) stream wrapper class keep (the context and) the device among the member variables, which means that you can write:
auto my_device = cuda::device::get(1);
auto my_stream = my_device.create_stream(); /* using some default param values here */
assert(my_stream.device() == my_device());
and not have to worry about it (+ it won't trigger the extra API calls since, at construction, we know what the current context is and what its device is).
Note: The above snippet is for a system with at least two CUDA devices, otherwise there is no device with index 1...