6

I'm wondering how can I exit from a thread, whose thread index is to big. I see two possibilities:

int i = threadIdx.x;
if(i >= count)
    return;
// do logic

or

int i = threadIdx.x;
if(i < count) {
    // do logic
}

I know, that both are correct, but which one affect more the performance?

TheWhiteRabbit
  • 15,480
  • 4
  • 33
  • 57
Tomasz Dzięcielewski
  • 3,829
  • 4
  • 33
  • 47

1 Answers1

4

Although both are the same in terms of performance, you should take into account that the first one is not recommended.

Return a thread within a kernel could cause an unexpected behaviour in the rest of your code.

By unexpected behaviour I mean whatever problem related to the minimum unit of threads that are grouped in a warp. In example if you have an if / else block in your kernel, this situation is known as thread divergence and in a normal case it results in threads remaining idle and others executing some instructions.

CUDA by Example Book, Chapter 5, Thread Cooperation:

But in the case of __syncthreads(), the result is somewhat tragic. The CUDA Architecture guarantees that no thread will advance to an instruction beyond the __syncthreads() until every thread in the block has executed the __syncthreads()

So, it is mainly related to the threads synchronization within a kernel. You can find a very good question / answer about this topic here: Can I use __syncthreads() after having dropped threads?

As I final note, I've also used that bad practice and no problem appeared but there is no guarantee that problems may arise in the future. It is something that I would not recommend

Community
  • 1
  • 1
pQB
  • 3,077
  • 3
  • 23
  • 49