In CUDA device code, the following if-else
statement will cause divergence among the threads of a warp, resulting in two passes by the SIMD hardware. Assume Vs
is a location in shared memory.
if (threadIdx.x % 2) {
Vs[threadIdx.x] = 0;
} else {
Vs[threadIdx.x] = 1;
}
I believe there will also be two passes when we have an if
statement, with no else
branch. Why is this the case?
if (threadIdx.x % 2) {
Vs[threadIdx.x] = 0;
}
Would the following if
statement be completed in 3 passes?
if (threadIdx.x < 10) {
Vs[threadIdx.x] = 0;
} else if (threadIdx.x < 20) {
Vs[threadIdx.x] = 1;
} else {
Vs[threadIdx.x] = 2;
}