This is documented to be the case in multiple places:
- Programming guide:
Kernel launches and host <-> device memory copies that do not specify any stream parameter, or equivalently that set the stream parameter to zero, are issued to the default stream.
- For example,
cudaMemcpyAsync
:
The copy can optionally be associated to a stream by passing a non-zero stream argument.
It seems quite safe to assume that the default stream is equivalent to (cudaStream_t) 0
Note that you can call out the default stream specifically with an argument other than zero, using cudaStreamLegacy
(or cudaStreamPerThread
) as described here. Interestingly, in CUDA 11.4, cudaStreamLegacy
is a #define
in driver_types.h as follows:
#define cudaStreamLegacy ((cudaStream_t)0x1)
This probably makes sense, since it is always associated with the legacy default stream, whereas a stream argument of 0 will reference the current system-defined default stream, whether that happens to be the legacy default stream or the per-thread default stream, as indicated in the previously linked blog.
Similarly, cudaStreamPerThread
is defined to be (cudaStream_t)2
.