How do you do parallel selection efficiently ?
For example, given this scalar code, is there a way to write it so the Cg compiler will make the code execute in parallel / SIMD (and potential using a branchfree selection as well).
Out.x = ( A.x <= threshold) ? B.x : C.x ;
Out.y = ( A.y <= threshold) ? B.y : C.y ;
Out.z = ( A.z <= threshold) ? B.z : C.z ;
Out.w = ( A.w <= threshold) ? B.w : C.w ;