Let's say a SM has been populated with 8 blocks of 64 threads each.
That gives us 2 warps/block, and 16 warps in total. SMs can alternate between warps in order to hide latencies. Must these warps belong to the same block, or can a warp from block 5 be replaced by a warp from block 8, for example?