The question is simple, does it matter that some threads of the block reach __syncthreads()
and some of them not? Take the following code.
for (unsigned int s=blockDim.x/2; s>0; s>>=1) {
if (tid < s) {
sdata[tid] += sdata[tid + s];
} else {
break;
}
__syncthreads();
}
Does it make some deadlock or some other issues? Should I put __syncthreads()
after for or is it good like this?