0

I have the following nested for loop:

int n = 8;
int counter = 0;

for (int i = 0; i < n; i++)
{
    for (int j = i + 1; j < n; j++)
    {
        printf("(%d, %d)\n", i, j);
        counter++;
    }
}

Which prints (0,1) to (6,7) as expected and the printf() statement is ran 28 times as indicated by counter.

I have been the set the task of improving the efficiency of this code by improving its locality (this is test code, the value of n in the actual program is much larger and i and j are used to index into two 1d arrays) and have employed what I believe to be a fairly standard technique:

int chunk = 4;

for(int i = 0; i < n; i+=chunk)
    for(int j = 0; j < n; j+=chunk)
        for (int i_chunk = 0; i_chunk < chunk; i_chunk++)
            for (int j_chunk = i_chunk + 1; j_chunk < chunk; j_chunk++)
            {
                printf("(%d, %d)\n", i+i_chunk, j+j_chunk);
                counter++;
            }

However, here printf() is only being ran 24 times because the j_chunk = i_chunk + 1 means that where before the j loop printed (0,1) to (0,7), the two iterations of the j_chunk loop where i+i_chunk == 0 print (0,1) to (0,3) and (0,5) to (0,7) missing (0,4).

I understand why it is doing this but I can't for the life of me come up with a solution; any help would be appreciated.

BodneyC
  • 110
  • 1
  • 11
  • Are you sure this is correct `for (int j_chunk = i_chunk + 1; j_chunk < chunk; j++)`? Shouldn't be: `for (int j_chunk = i_chunk + 1; j_chunk < chunk; j_chunk++)`? – Amadeus Oct 22 '17 at 18:17
  • Yes you're right, it was a mis-type when I copied the code into SO, thanks for pointing it out for me – BodneyC Oct 22 '17 at 18:22
  • Before redesigning your code, have you tried changing the optimization settings for your compiler? Are you compiling in release mode? – Thomas Matthews Oct 22 '17 at 18:33
  • You may be able to gain some performance by using *loop unrolling* or making the data accesses more data cache friendly. – Thomas Matthews Oct 22 '17 at 18:35
  • Unfortunately this is an exercise in the theory behind cache-locality, it needs to be in the code and chunked off in this way. Thanks anyway. – BodneyC Oct 22 '17 at 18:36
  • Typically, the most expensive operations are: division and branching (looping, function calls), and I/O. There is not a lot you can do in your first example to reduce these items. – Thomas Matthews Oct 22 '17 at 18:37
  • Look at the generated assembly code for the first example, with optimizations set on high. – Thomas Matthews Oct 22 '17 at 18:39
  • It doesn't matter how well the compiler can optimize the code sadly because the changes need to be in the source and the loop chunked off in this style. – BodneyC Oct 22 '17 at 18:45
  • Your not accessing memory (except for stack and instruction area), so there is no reason for optimization. Chunking for data cache memory, when you aren't using data cache, is pointless. – Thomas Matthews Oct 22 '17 at 18:46
  • Im not accessing memory in this test-code because it is test-code and not the real code in which I do access memory. This test-code is just to sort out the iterations of the for-loops. – BodneyC Oct 22 '17 at 18:48
  • The data cache is involved when you use classes, structs or arrays, provided the classes and structs have data members. Code like your first example would use local stack memory, worst case. Best case, the compiler uses registers. Registers are more efficient than data cache memory. – Thomas Matthews Oct 22 '17 at 18:49
  • 1
    "`i` and `j` are used to index into two 1d arrays", again, this is test-code – BodneyC Oct 22 '17 at 18:52

1 Answers1

0

First you need to make sure that j is never in a lower chunk than i, so your outer loops should be:

for(int i = 0; i < n; i+=chunk)
   for(int j = i; j < n; j+=chunk)

Then you need different behaviour based on whether i and j are in the same chunk or not. If they are, j_chunk needs to allways be larger than i_chunk, otherwise you need to go through all possible combinations:

if(i==j)
{
    for (int i_chunk = 0; i_chunk < chunk; i_chunk++)
    {
        for (int j_chunk = i_chunk + 1; j_chunk < chunk; j_chunk++)
        {
            printf("(%d, %d)\n", i+i_chunk, j+j_chunk);
            counter++;
        }
    }
}
else
{
    for (int i_chunk = 0; i_chunk < chunk; i_chunk++)
    {
        for (int j_chunk = 0; j_chunk < chunk; j_chunk++)
        {
            printf("(%d, %d)\n", i+i_chunk, j+j_chunk);
            counter++;
        }
    }
}
Knoep
  • 858
  • 6
  • 13