The CUDA programming guide states that
__syncthreads() is allowed in conditional code but only if the conditional evaluates identically across the entire thread block, otherwise the code execution is likely to hang or produce unintended side effects.
So if I need to synchronize threads with a conditional branching across a block, some of which threads may or may not take the branch that includes the __syncthreads()
call, does this mean that it won't work?
I'm imagining that there might be all sorts of cases in which you might need to do this; for example, if you have a binary mask and need to apply a certain operation on pixels conditionally. Say, if (mask(x, y) != 0)
then execute the code that includes __syncthreads()
, otherwise do nothing. How would that be done?