Branches in general do not affect performance but branch divergence does. That is, two threads taking different paths (e.g. one fulfills the if
condition, the other does not). Because all threads of a GPU execute the same "line of code" some threads have to wait while the code which is not part of their path is executed.
Well, that is not really true as only all threads in one warp (NVIDIA) or wavefront (AMD) execute the same "line of code". (Currently, the warp size of NVIDIA GPUs is 32 and the wafefront size of AMD GPUs is 64.)
So if there is an if-else
block in your kernel the worst case scenario is indeed a 50% performance drop. And even worse: If there are n
possible branches the performance can decrease down to 1/n
of the performance without divergence (that is no branches or all threads in a warp/ wafefront are taking the same path). Of course for such scenarios your whole kernel must be embedded in an if-else
(or switch
) construct.
But as written above this will only happen if the threads which are taking different paths are in the same warp/wafefront. So it is up to you to write your code/ rearrange data/ chose the algorithm/ ... to avoid branch divergence as far as possible.
Tl;DR: There can be branches but if different threads are taking different branches they have to be in different warps/ wafefronts to avoid divergence and thus performance loss.