0

I'm having trouble unrolling nested forloops. I understand the concept, I'm trying to put it into practice, but I'm getting tripped up on editing the statements within my for loops to match the unrolling.

If someone could just show me an efficient unroll and walk me through it that'd be a huge help.

Here is the loop section I want to unroll:

for (i=1 ; i < WIDTH-1 ; ++i) 
{
      for (j = 1 ; j < HEIGHT-1 ; ++j) 
      {
         n = getNeighbors(prv, i, j);    /* This is where I'm confused */
         mask = (prev[i][j] << 1);       
         next[i][j] = !(((n >> prev[i][j]) ^ 3) ^ mask);
      }
}

UPDATE: Would this be correct?

for (i=1 ; i < WIDTH-1 ; i+=4) 
{
      for (j = 1 ; j < HEIGHT-1 ; j+=4) 
      {
         n = getNeighbors(prv, i, j);  
         mask = (prev[i][j] << 1);       
         next[i][j] = !(((n >> prev[i][j]) ^ 3) ^ mask);
         n = getNeighbors(prv, i, j+1);  
         mask = (prev[i][j+1] << 1);       
         next[i][j+1] = !(((n >> prev[i][j+1]) ^ 3) ^ mask);
         n = getNeighbors(prv, i, j+2);  
         mask = (prev[i][j+2] << 1);       
         next[i][j+2] = !(((n >> prev[i][j+2]) ^ 3) ^ mask);
         n = getNeighbors(prv, i, j+3);  
         mask = (prev[i][j+3] << 1);       
         next[i][j+3] = !(((n >> prev[i][j+3]) ^ 3) ^ mask);
      }
      for (j = 1 ; j < HEIGHT-1 ; j+=4) 
      {
         n = getNeighbors(prv, i+1, j);  
         mask = (prev[i+1][j] << 1);       
         next[i+1][j] = !(((n >> prev[i+1][j]) ^ 3) ^ mask);
         n = getNeighbors(prv, i+1, j+1);  
         mask = (prev[i+!][j+1] << 1);       
         next[i+1][j+1] = !(((n >> prev[i+1][j+1]) ^ 3) ^ mask);
         n = getNeighbors(prv, i+1, j+2);  
         mask = (prev[i+1][j+2] << 1);       
         next[i+1][j+2] = !(((n >> prev[i+1][j+2]) ^ 3) ^ mask);
         n = getNeighbors(prv, i+1, j+3);  
         mask = (prev[i+1][j+3] << 1);       
         next[i+1][j+3] = !(((n >> prev[i+1][j+3]) ^ 3) ^ mask);
      }
      for (j = 1 ; j < HEIGHT-1 ; j+=4) 
      {
         n = getNeighbors(prv, i+2, j);  
         mask = (prev[i+2][j] << 1);       
         next[i+2][j] = !(((n >> prev[i+2][j]) ^ 3) ^ mask);
         n = getNeighbors(prv, i+2, j+1);  
         mask = (prev[i+2][j+1] << 1);       
         next[i+2][j+1] = !(((n >> prev[i+2][j+1]) ^ 3) ^ mask);
         n = getNeighbors(prv, i+2, j+2);  
         mask = (prev[i+2][j+2] << 1);       
         next[i+2][j+2] = !(((n >> prev[i+2][j+2]) ^ 3) ^ mask);
         n = getNeighbors(prv, i+2, j+3);  
         mask = (prev[i+2][j+3] << 1);       
         next[i+2][j+3] = !(((n >> prev[i+2][j+3]) ^ 3) ^ mask);
      }
      for (j = 1 ; j < HEIGHT-1 ; j+=4) 
      {
         n = getNeighbors(prv, i+3, j);  
         mask = (prev[i+3][j] << 1);       
         next[i+3][j] = !(((n >> prev[i+3][j]) ^ 3) ^ mask);
         n = getNeighbors(prv, i+3, j+1);  
         mask = (prev[i][j+1] << 1);       
         next[i+3][j+1] = !(((n >> prev[i+3][j+1]) ^ 3) ^ mask);
         n = getNeighbors(prv, i+3, j+2);  
         mask = (prev[i][j+2] << 1);       
         next[i+3][j+2] = !(((n >> prev[i+3][j+2]) ^ 3) ^ mask);
         n = getNeighbors(prv, i+3, j+3);  
         mask = (prev[i+3][j+3] << 1);       
         next[i+3][j+3] = !(((n >> prev[i+3][j+3]) ^ 3) ^ mask);
      }
}
slippeel
  • 103
  • 2
  • 10
  • what is `prv`? what do you try to achieve by unrolling the loop(s)? do you finally want a single loop or no loop at all? – m.s. May 31 '15 at 18:09
  • 3
    Why not just let the compiler take care of unrolling loops for you ? – Paul R May 31 '15 at 18:11
  • Are WIDTH and HEIGHT constants? The values are req'd for unrolling. – QuentinUK May 31 '15 at 18:21
  • Sorry for not providing specifics. `prv` is a 2d array, I am trying to learn how to optimize code and achieve faster runtimes, I guess I would want no loop at all but I would want to see both version. I am trying to learn it without the compiler's help. WIDTH and HEIGHT are constants. – slippeel May 31 '15 at 18:50

2 Answers2

1

Let the loop be :

for(int i = 0; i < x; ++i)
    for(int j = 0; j < y; ++j)
        dosomething(i, j);

It can be unrolled as :

for(int i = 0; i < x; i += 4) {
    for(int j = 0; j < y; j += 4) {
        dosomething(i, j);
        dosomething(i, j + 1);
        dosomething(i, j + 2);
        dosomething(i, j + 3);
    }
    for(int j = 0; j < y; j += 4) {
        dosomething(i + 1, j);
        dosomething(i + 1, j + 1);
        dosomething(i + 1, j + 2);
        dosomething(i + 1, j + 3);
    }
    for(int j = 0; j < y; j += 4) {
        dosomething(i + 2, j);
        dosomething(i + 2, j + 1);
        dosomething(i + 2, j + 2);
        dosomething(i + 2, j + 3);
    }
    for(int j = 0; j < y; j += 4) {
        dosomething(i + 3, j);
        dosomething(i + 3, j + 1);
        dosomething(i + 3, j + 2);
        dosomething(i + 3, j + 3);
    }
}

Not sure how much benefit would this have. You should profile your code after such unrolling.

a_pradhan
  • 3,285
  • 1
  • 18
  • 23
  • Thanks for your comment. If I have multiple statements in the loops, like the 3 assignments I have in my code, how would I structure that? I assume that I would do it like how you showed me in your comment, where each assignment is done (for example) 4 times with +1,+2, +3 – slippeel May 31 '15 at 18:55
  • Such unrolling is only possible if `x` and `y` are known to be multiples of `4`. Unrolling the outer loop is much less useful than unrolling the inner one. – chqrlie May 31 '15 at 19:30
0

Just an example:

int r[3][3];

// loop version
for (int i = 0; i < 3; i++) {
    for (int j = 0; j < 3; j++) {
        r[i][j] = i + j;
    }
}

// unrolled version
r[0][0] = 0;
r[0][1] = 1;
r[0][2] = 2;
r[1][0] = 1;
r[1][1] = 2;
r[1][2] = 3;
r[2][0] = 2;
r[2][1] = 3;
r[2][2] = 4;

Please note that such complete unrolling is easily possible only for vectors or matrices whose size is known at compile time. Please also note that recent compiler are often able to unroll such loops by themselves.

dlask
  • 8,776
  • 1
  • 26
  • 30