Questions tagged [loop-unrolling]

Loop unrolling is a loop optimization strategy.

164 questions
6
votes
2 answers

Loop unroll (with bitwise operations)

I am writing a Linux Kernel driver (for ARM) and in an irq handler I need to check the interrupt bits. bit 0/16 End point 0 In/Out interrupt (very likely, while In is more likely) 1/17 End point 1 In/Out interrupt ... 15/31 End point 15…
Alvin Wong
  • 12,210
  • 5
  • 51
  • 77
5
votes
1 answer

How to give the compiler the hint about the maximum time a loop would run

// if I know that in_x will never be bigger than Max template void foo(unsigned in_x) { unsigned cap = Max; // I can tell the compiler this loop will never run more than log(Max) times for (; cap != 0 && in_x != 0; cap…
BlueWanderer
  • 2,671
  • 2
  • 21
  • 36
5
votes
1 answer

`Out of resources` error while doing loop unrolling

When I increase the unrolling from 8 to 9 loops in my kernel, it breaks with an out of resources error. I read in How do I diagnose a CUDA launch failure due to being out of resources? that a mismatch of parameters and an overuse of registers could…
Framester
  • 33,341
  • 51
  • 130
  • 192
5
votes
2 answers

Vectorization: when is worth manually unrolling loops?

I would like to have a general understanding of when I can expect a compiler to vectorize a loop and when it is worth for me to unroll the loop to help it decides to use vectorization. I understand the details are very important (what compiler,…
luca
  • 7,178
  • 7
  • 41
  • 55
5
votes
2 answers

C loop unrolling optimization performance

First: I know what loop optimization is and how it works yet I found a case where I cannot explain the results. I created a prime number checker that calls modulo on each number from 2 to n - 1, so no algorithmical optimizations. EDIT: I know that…
jklmnn
  • 481
  • 1
  • 5
  • 11
5
votes
4 answers

unrolling for loops in a special case function

So I'm trying to optimize some code. I have a function with a variable sized loop. However for efficiency sake I want to make cases with 1, 2 and 3 sized loops special cases that are completely unrolled. My approach so far is to declare the loop…
p clark
  • 51
  • 1
5
votes
1 answer

Loop unrolling in Metal kernels

I need to force the Metal compiler to unroll a loop in my kernel compute function. So far I've tried to put #pragma unroll(num_times) before a for loop, but the compiler ignores that statement. It seems that the compiler doesn't unroll the loops…
sarasvati
  • 792
  • 12
  • 30
5
votes
2 answers

Porting duff's device from C to JavaScript

I have this kind of Duff's device in C and it works fine (format text as money): #include #include char *money(const char *src, char *dst) { const char *p = src; char *q = dst; size_t len; len = strlen(src); …
David Ranieri
  • 39,972
  • 7
  • 52
  • 94
5
votes
3 answers

does unrolling loops in x86-64 actually make code faster?

I assume everyone knows what "unrolling loops means". Just in case I'll give a concrete example in a moment. The question I will ask is... does unrolling loops in x86-64 assembly language actually make code faster? I will explain why I begin to…
honestann
  • 1,374
  • 12
  • 19
5
votes
2 answers

C++ Loop Unrolling Performance Difference (Project Euler)

I have a question about a Project Euler question and optimization using loop unrolling. Problem description: 2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder. What is the smallest positive…
Blaine
  • 61
  • 3
4
votes
1 answer

Does compiler only unroll the outer loop completely?

I try to compile this code and use loop-specific pragmas to tell the compiler how many times to unroll a counted loop. #include int main() { std::vector v(8192); #pragma GCC unroll 8 // 16 for (int i = 0; i < 16; i++) { for…
4
votes
2 answers

Unrolling For Loop in C

I am trying to unroll this loop by a factor of 2. for(i=0; i<100; i++){ x[i] = y[i] + z[i]; z[i] = y[i] + a[i]; z[i+1] = y[i] * a[i]; } I have it unrolled to: for(i=0; i<100; i+=2){ x[i] = y[i] + z[i]; x[i+1] = y[i+1] + z[i+1]; z[i]…
4
votes
1 answer

Loop unrolling - G++ vs. Clang++

I was wondering whether it is worth to aid the compiler with templates to unroll a simple loop. I prepared the following test: #include #include #include class TNode { public: void Assemble(); void Assemble(TNode…
metalfox
  • 6,301
  • 1
  • 21
  • 43
4
votes
1 answer

Manual loop unrolling with known maximum size

Please take a look at this code in an OpenCL kernel: uint point_color = 4278190080; float point_percent = 1.0f; float near_pixel_size = (...); float far_pixel_size = (...); float delta_pixel_size = far_pixel_size - near_pixel_size; float3 near =…
Izhido
  • 392
  • 3
  • 13
4
votes
4 answers

GCC inline asm NOP loop not being unrolled at compile time

Venturing out of my usual VC++ realm into the world of GCC (via MINGW32). Trying to create a Windows PE that consists largely of NOPs, ala: for(i = 0; i < 1000; i++) { asm("nop"); } But either I'm using the wrong syntax or the compiler is…
Rushyo
  • 7,495
  • 4
  • 35
  • 42
1 2
3
10 11