It is my understanding that if I have CUDA code of the form:
if (condition) {
// do x
}
else {
//do y
}
Then due to the SIMT execution of threads in a warp, the execution of the conditional will be serialized and all threads will be required to run both the x and y sections of the code. The exception to this is if the branches are big, in which case the compiler will insert a check using __any
to avoid unnecessarily running code.
However, if I already know ahead of time that all threads in a warp will have the same value of condition
, then this __any
operation is unnecessary, merely serving to slow down my code.
I am wondering if there exists any way to instruct the compiler not to include this voting operation, but instead to assume that the evaluation of the condition is the same for all threads in the warp, and to run only the corresponding block of code?