Loop unrolling is a loop optimization strategy.
Questions tagged [loop-unrolling]
164 questions
2
votes
1 answer
Can I tell nvcc to apply #pragma unroll to all loops in a function?
I have a CUDA kernel with a bunch of loops I want to unroll. Right now I do:
void mykernel(int* in, int* out, int baz) {
#pragma unroll
for(int i = 0; i < 4; i++) {
foo();
}
/* ... */
#pragma unroll
for(int i = 0;…

einpoklum
- 118,144
- 57
- 340
- 684
2
votes
1 answer
XNA optimizations - Loop Unrolling?
I'm making a XNA game and I wonder if there is a way of optimizing some loops. For example:
I have a Map class, that contains a collection of tiles, so, in the Map Update() I just call every tiles Update()
// Update method in Map Class
…

mRt
- 1,223
- 6
- 18
- 32
2
votes
1 answer
Can I make #Pragma unroll accept macros/expressions rather than plain numbers?
I am trying to tell my compiler to unroll a loop for me using #pragma unroll. However, the number of iterations is determined by a compile-time variable, so the loop needs to be unrolled that many times. Like this:
#define ITEMS 4
#pragma unroll…

Yellow
- 3,955
- 6
- 45
- 74
1
vote
1 answer
unrolling a while loop
original code
while(i<30){
// do something
i++;
}
unrolled while loop
while(i<15){
// do something twice
i+=2;
}
Cant we unroll it as shown above. Do we always have to do it like http://en.wikipedia.org/wiki/Loop_unrolling ?

klijo
- 15,761
- 8
- 34
- 49
1
vote
0 answers
#pragma GCC unroll with compile-time argument
Is there a way to unroll a loop in GCC based on compile-time (e.g., template) parameter?
The following does not compile, unless I replace unroll(N) with a concrete integer like unroll(8)
template
void fun ()
{
#pragma GCC unroll(N)
…

user2052436
- 4,321
- 1
- 25
- 46
1
vote
1 answer
Why do 2 opeartions without loop unrolling and with loop unrolling give the same performance?
I am studying about memory in C++. But there is one thing that makes me doubtful.
I am trying 2 different methods for array sum. One is that I access only 1 index at a time and increment i by 1. Another is that I access 5 indices of array at a time…

Liu Bei
- 565
- 3
- 9
- 19
1
vote
1 answer
Why is my code giving time-limit exceeded while a near identical code works just fine in LeetCode?
Ref: https://leetcode.com/problems/word-search/submissions/
Brief problem statement: Given a matrix of characters and a string, does the string exist in this matrix. Please refer the above link for details.
Solution-1 Gives time-limit…

Ramasamy Kandasamy
- 647
- 7
- 9
1
vote
2 answers
gcc optimisation flag changes runtime behaviour
this is c code:
#include
int main() {
int i = 1;
while (i) i++;
printf("%d\n", i);
}
running:
miglanigursimar@Miglanis-MacBook-Pro 002 % gcc main.c
miglanigursimar@Miglanis-MacBook-Pro 002 %…

tony
- 133
- 4
1
vote
1 answer
how to optimize this code with unrolling factor 3?
void randomImprovedfunction(double a[], double p[], long n)
2 {
3 long i;
4 double last_v, v;
5 last_v = p[0] = a[0];
6 for (i=1; i

Megan Darcy
- 530
- 5
- 15
1
vote
1 answer
how will unrolling affect the cycles per element count CPE
how do I calculate CPE (cycles per element) with these code snippets?
what is the difference in the CPE between the 2 given code snippets?
I have this piece of code
void randomFunction(float a[],float Tb[],float c[],long int n){
int…

Megan Darcy
- 530
- 5
- 15
1
vote
1 answer
Why loop unroll brings so much speedup on ARM Cortex-a53?
I'm playing around with loop unroll with the following code on a ARM Cortex-a53 processor running in AArch64 state:
void do_something(uint16_t* a, uint16_t* b, uint16_t* c, size_t array_size)
{
for (int i = 0; i < array_size; i++)
{
a[i] =…

Da Teng
- 551
- 4
- 21
1
vote
0 answers
I am trying to speed up a nested for loop via openmp & unrolling but it goes slow I wonder why?
I am trying to speed up a simple nested loop:
for (int k = 0; k < n; k++)
for (int i = 0; i < n - k; ++i)
c[k] += a[i + k] * b[i];
first I tried to use openmp(since this loop is not well balanced, so I added a little modification)
#pragma…

Cino
- 83
- 1
- 7
1
vote
2 answers
Profiling a benchmark compiled for the SPARC v8 on an x86
I'm trying to make a (small) improvement to the leon3 processor (instruction set is SPARC v8) for an academic exercise. Before I decide what to improve, I want to profile a couple of benchmark programs that I want to tailor the improvements to.
I…

ArjunShankar
- 23,020
- 5
- 61
- 83
1
vote
2 answers
Speeding up a do-while loop with loop unrolling
I am trying to speed up code in a function that may be called many times over (maybe more than a million). The code has to do with setting two variables to random numbers and finding squared distance. My first idea for this is loop unrolling but I…

Saul
- 311
- 1
- 2
- 10
1
vote
1 answer
Manual loop unrolling within a C++ Introsort Runs Incorrectly
I'm writing a simple in-place introsort in C++, in which I'm trying to manually unroll a loop within the partition function for the sake of optimization. The program, which I'll include below, compiles but isn't able to sort a random list…

jaytlang
- 23
- 3