Divergence in CUDA - exit from a thread in kernel

Question

I'm wondering how can I exit from a thread, whose thread index is to big. I see two possibilities:

int i = threadIdx.x;
if(i >= count)
    return;
// do logic

or

int i = threadIdx.x;
if(i < count) {
    // do logic
}

I know, that both are correct, but which one affect more the performance?

Both will give you same performance. – sgarizvi Feb 14 '13 at 07:10 — sgarizvi, Feb 14 '13 at 07:10

score 4 · Accepted Answer · edited May 23 '17 at 12:02

Although both are the same in terms of performance, you should take into account that the first one is not recommended.

Return a thread within a kernel could cause an unexpected behaviour in the rest of your code.

By unexpected behaviour I mean whatever problem related to the minimum unit of threads that are grouped in a warp. In example if you have an if / else block in your kernel, this situation is known as thread divergence and in a normal case it results in threads remaining idle and others executing some instructions.

CUDA by Example Book, Chapter 5, Thread Cooperation:

But in the case of __syncthreads(), the result is somewhat tragic. The CUDA Architecture guarantees that no thread will advance to an instruction beyond the __syncthreads() until every thread in the block has executed the __syncthreads()

So, it is mainly related to the threads synchronization within a kernel. You can find a very good question / answer about this topic here: Can I use __syncthreads() after having dropped threads?

As I final note, I've also used that bad practice and no problem appeared but there is no guarantee that problems may arise in the future. It is something that I would not recommend

What do you mean unexpected behavior, what is the problem? I have seen it in tutorials and I have also used it and no problem has appeared to me so far. — George Aprilis, Feb 14 '13 at 08:44
@GeorgeAprilis The problem is mainly related to _good practices_ and future synchronization within a block. — pQB, Feb 15 '13 at 11:27

Divergence in CUDA - exit from a thread in kernel

1 Answers1