I want to call an exclusive scan function from inside a kernel that does a radix sort. But the exclusive scan only needs half of the threads to do its work.
The exclusive scan algorithm needs several __syncthreads() in it. If i have a statement at the start like
if(threadIdx.x > NTHREADS/2) return;
these threads will not participate in the exclusive scan syncthreads, which is not allowed. Is there some way around this problem. I do have the call to exclusive scan surrounded by __syncthread()s.