I have a CUDA kernel with a bunch of loops I want to unroll. Right now I do:
void mykernel(int* in, int* out, int baz) {
#pragma unroll
for(int i = 0; i < 4; i++) {
foo();
}
/* ... */
#pragma unroll
for(int i = 0; i < 6; i++) {
bar();
}
}
et cetera. I want to tell (hint at) my C/C++ compiler to unroll all of these loops, without needing a separate hint for each loop. However, I don't want to unroll all loops in all code in the file, just in this function.
If this were GCC, I could do:
__attribute__((optimize("unroll-loops")))
void mykernel(int* in, int* out, int baz) {
for(int i = 0; i < 4; i++) {
foo();
}
/* ... */
for(int i = 0; i < 6; i++) {
bar();
}
}
Or use option pushing-and-popping. Is there something equivalent I can do with CUDA?