1

The question is simple, does it matter that some threads of the block reach __syncthreads() and some of them not? Take the following code.

for (unsigned int s=blockDim.x/2; s>0; s>>=1) {
    if (tid < s) {
        sdata[tid] += sdata[tid + s];
    } else {
       break;
    }
    __syncthreads();
}

Does it make some deadlock or some other issues? Should I put __syncthreads() after for or is it good like this?

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • 1
    `__syncthreads()` must be called in every thread. You must not call it conditional on some variable data like you do here. (I believe this is [well documented](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#synchronization-functions).) – Kerrek SB Oct 05 '14 at 18:17
  • I'll check the whole documentation, however I used it in my code and everything went well and the algorithm was faster too. – Róbert Birkus Oct 05 '14 at 18:23
  • 2
    Please never ever get a professional job writing software with that attitude, and especially stay away from banking, defense and playground software. – Kerrek SB Oct 05 '14 at 18:26

1 Answers1

1

It is undefined behavior:

__syncthreads() is allowed in conditional code but only if the conditional evaluates identically across the entire thread block, otherwise the code execution is likely to hang or produce unintended side effects.

Read more at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

Marco A.
  • 43,032
  • 26
  • 132
  • 246