In nvidia's cuda technology.
There are two concepts :the concept of stream in cuda programming, and the concept of Stream Multiprocessors(also called SMM in Maxwell Architecture,sometimes shorted by SM) ,how to understand the two?
Case I:i just use the default stream to execute a kernel,and the block number is large enough. In this case , will all my 5 Stream Multiprocessors (GTX 750 Ti has 5 SMM,which is 640 core) be engaged in the processing the blocks,or just one Stream Multiprocessors is engaged in processing the one default stream
Case II:i use cudaStreamCreate() to create 5 (or more) streams,and i use them to execute 5 different kernels,where all the 5 kernels are independent ,will the 5 kernels be parallel processed physically?