cuda stream and Stream Multiprocessors

Question

In nvidia's cuda technology.

There are two concepts :the concept of stream in cuda programming, and the concept of Stream Multiprocessors(also called SMM in Maxwell Architecture,sometimes shorted by SM) ,how to understand the two?

Case I:i just use the default stream to execute a kernel,and the block number is large enough. In this case , will all my 5 Stream Multiprocessors (GTX 750 Ti has 5 SMM,which is 640 core) be engaged in the processing the blocks,or just one Stream Multiprocessors is engaged in processing the one default stream

Case II:i use cudaStreamCreate() to create 5 (or more) streams,and i use them to execute 5 different kernels,where all the 5 kernels are independent ,will the 5 kernels be parallel processed physically?

score 2 · Answer 1 · answered Oct 11 '14 at 15:25

2

There is no connection between cuda streams and Streaming Multiprocessors.

Regardless of which stream arrangement you use to launch a kernel, all the SMs will participate in executing that kernel, if there are enough blocks.

If you launch 5 kernels in 5 separate streams, most likely your kernels will execute approximately sequentially, unless all the kernels are very small in terms or resource usage, in which case they may execute at the same time.

answered Oct 11 '14 at 15:25

Robert Crovella

143,785
11
213
257

Seems to get it. According to your explanation, creating many streams may not be helpful if kernels are complicated and their blocks are large. Is it right? – baowenbo Oct 11 '14 at 16:09
streams are used to arrange asynchronous concurrent activities, including asynchronous concurrent execution (of kernels, and between device and host) and overlap of copy and compute operations. You should create as many streams as is dictated by the type of concurrent activity you want to arrange/manage. I think the question you want to ask is "should I launch multiple concurrent kernels?" In that case, yes it "may not be helpful if kernels are complicated and their blocks are large" – Robert Crovella Oct 11 '14 at 16:48

score 2 · Answer 2 · answered Oct 11 '14 at 15:29

There are two concepts :the concept of stream in cuda programming, and the concept of Stream Multiprocessors(also called SMM in Maxwell Architecture,sometimes shorted by SM) ,how to understand the two?

Despite the similar terminology, both concepts are unrelated.

A streaming multiprocessor is a hardware component composed of several streaming processors that execute your kernel in a SIMD fashion.

A stream is just a command queue on which you queue commands (yeah...) such as kernel executions or memory copies. Streams execute in parallel, so if you have two independent kernels, you may want to execute them in separate streams for (possibly) improved performance. You may also overlap kernel execution and data transfers if your device supports it.

Case I:i just use the default stream to execute a kernel,and the block number is large enough. In this case , will all my 5 Stream Multiprocessors (GTX 750 Ti has 5 SMM,which is 640 core) be engaged in the processing the blocks,or just one Stream Multiprocessors is engaged in processing the one default stream

Assuming the block number is large enough, all SMs will be busy.

Case II:i use cudaStreamCreate() to create 5 (or more) streams,and i use them to execute 5 different kernels,where all the 5 kernels are independent ,will the 5 kernels be parallel processed physically?

That's up to the scheduler. If your kernel computations can overlap (they don't fully utilize the GPU), then they most certainly will.

Thank you. I used to think that only one SM of the GPU is running, and i will get 5 times faster if i use 5 streams. — baowenbo, Oct 11 '14 at 16:02

cuda stream and Stream Multiprocessors

2 Answers2