2

I have a CUDA kernel with a bunch of loops I want to unroll. Right now I do:

void mykernel(int* in, int* out, int baz) {    
    #pragma unroll
    for(int i = 0; i < 4; i++) {
        foo();
    }
    /* ... */
    #pragma unroll
    for(int i = 0; i < 6; i++) {
        bar();
    }
}

et cetera. I want to tell (hint at) my C/C++ compiler to unroll all of these loops, without needing a separate hint for each loop. However, I don't want to unroll all loops in all code in the file, just in this function.

If this were GCC, I could do:

__attribute__((optimize("unroll-loops")))
void mykernel(int* in, int* out, int baz) {    
    for(int i = 0; i < 4; i++) {
        foo();
    }
    /* ... */
    for(int i = 0; i < 6; i++) {
        bar();
    }
}

Or use option pushing-and-popping. Is there something equivalent I can do with CUDA?

einpoklum
  • 118,144
  • 57
  • 340
  • 684

1 Answers1

6

#pragma unroll is the only mechanism for requesting unrolling that is documented in the CUDA C Programming Guide 5.5, and it must be specified before each loop. But the compiler unrolls all "small loops with a known trip count" by default, so you may not need the unroll directives in your first example.

I don't think controlling unrolling at the function level would be all that useful. You should probably initially rely on the compiler to select the best amount of unrolling and then tweak each loop separately if profiling indicates that it could help.

Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
  • What constitutes a 'small loop'? – einpoklum Dec 18 '13 at 21:29
  • 3
    I don't think NVIDIA publishes the heuristics for what gets automatically unrolled but in addition to the trip count, factors like number of instructions in the loop and target compute capability may be taken into account. @njuffa gives some useful information [here](http://stackoverflow.com/questions/13222165/in-what-types-of-loops-is-it-best-to-use-the-pragma-unroll-directive-in-cuda). – Roger Dahl Dec 19 '13 at 04:53