Questions tagged [loop-unrolling]

Loop unrolling is a loop optimization strategy.

164 questions
2
votes
1 answer

Can I tell nvcc to apply #pragma unroll to all loops in a function?

I have a CUDA kernel with a bunch of loops I want to unroll. Right now I do: void mykernel(int* in, int* out, int baz) { #pragma unroll for(int i = 0; i < 4; i++) { foo(); } /* ... */ #pragma unroll for(int i = 0;…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
2
votes
1 answer

XNA optimizations - Loop Unrolling?

I'm making a XNA game and I wonder if there is a way of optimizing some loops. For example: I have a Map class, that contains a collection of tiles, so, in the Map Update() I just call every tiles Update() // Update method in Map Class …
mRt
  • 1,223
  • 6
  • 18
  • 32
2
votes
1 answer

Can I make #Pragma unroll accept macros/expressions rather than plain numbers?

I am trying to tell my compiler to unroll a loop for me using #pragma unroll. However, the number of iterations is determined by a compile-time variable, so the loop needs to be unrolled that many times. Like this: #define ITEMS 4 #pragma unroll…
Yellow
  • 3,955
  • 6
  • 45
  • 74
1
vote
1 answer

unrolling a while loop

original code while(i<30){ // do something i++; } unrolled while loop while(i<15){ // do something twice i+=2; } Cant we unroll it as shown above. Do we always have to do it like http://en.wikipedia.org/wiki/Loop_unrolling ?
klijo
  • 15,761
  • 8
  • 34
  • 49
1
vote
0 answers

#pragma GCC unroll with compile-time argument

Is there a way to unroll a loop in GCC based on compile-time (e.g., template) parameter? The following does not compile, unless I replace unroll(N) with a concrete integer like unroll(8) template void fun () { #pragma GCC unroll(N) …
user2052436
  • 4,321
  • 1
  • 25
  • 46
1
vote
1 answer

Why do 2 opeartions without loop unrolling and with loop unrolling give the same performance?

I am studying about memory in C++. But there is one thing that makes me doubtful. I am trying 2 different methods for array sum. One is that I access only 1 index at a time and increment i by 1. Another is that I access 5 indices of array at a time…
Liu Bei
  • 565
  • 3
  • 9
  • 19
1
vote
1 answer

Why is my code giving time-limit exceeded while a near identical code works just fine in LeetCode?

Ref: https://leetcode.com/problems/word-search/submissions/ Brief problem statement: Given a matrix of characters and a string, does the string exist in this matrix. Please refer the above link for details. Solution-1 Gives time-limit…
1
vote
2 answers

gcc optimisation flag changes runtime behaviour

this is c code: #include int main() { int i = 1; while (i) i++; printf("%d\n", i); } running: miglanigursimar@Miglanis-MacBook-Pro 002 % gcc main.c miglanigursimar@Miglanis-MacBook-Pro 002 %…
tony
  • 133
  • 4
1
vote
1 answer

how to optimize this code with unrolling factor 3?

void randomImprovedfunction(double a[], double p[], long n) 2 { 3 long i; 4 double last_v, v; 5 last_v = p[0] = a[0]; 6 for (i=1; i
1
vote
1 answer

how will unrolling affect the cycles per element count CPE

how do I calculate CPE (cycles per element) with these code snippets? what is the difference in the CPE between the 2 given code snippets? I have this piece of code void randomFunction(float a[],float Tb[],float c[],long int n){ int…
Megan Darcy
  • 530
  • 5
  • 15
1
vote
1 answer

Why loop unroll brings so much speedup on ARM Cortex-a53?

I'm playing around with loop unroll with the following code on a ARM Cortex-a53 processor running in AArch64 state: void do_something(uint16_t* a, uint16_t* b, uint16_t* c, size_t array_size) { for (int i = 0; i < array_size; i++) { a[i] =…
Da Teng
  • 551
  • 4
  • 21
1
vote
0 answers

I am trying to speed up a nested for loop via openmp & unrolling but it goes slow I wonder why?

I am trying to speed up a simple nested loop: for (int k = 0; k < n; k++) for (int i = 0; i < n - k; ++i) c[k] += a[i + k] * b[i]; first I tried to use openmp(since this loop is not well balanced, so I added a little modification) #pragma…
Cino
  • 83
  • 1
  • 7
1
vote
2 answers

Profiling a benchmark compiled for the SPARC v8 on an x86

I'm trying to make a (small) improvement to the leon3 processor (instruction set is SPARC v8) for an academic exercise. Before I decide what to improve, I want to profile a couple of benchmark programs that I want to tailor the improvements to. I…
ArjunShankar
  • 23,020
  • 5
  • 61
  • 83
1
vote
2 answers

Speeding up a do-while loop with loop unrolling

I am trying to speed up code in a function that may be called many times over (maybe more than a million). The code has to do with setting two variables to random numbers and finding squared distance. My first idea for this is loop unrolling but I…
Saul
  • 311
  • 1
  • 2
  • 10
1
vote
1 answer

Manual loop unrolling within a C++ Introsort Runs Incorrectly

I'm writing a simple in-place introsort in C++, in which I'm trying to manually unroll a loop within the partition function for the sake of optimization. The program, which I'll include below, compiles but isn't able to sort a random list…
jaytlang
  • 23
  • 3