During a lecture my professor gave us the following loop:
for (int i = 0; i < 100; i++) {
a[i] = a[i] + b[i];
b[i + 1] = c[i] + d[i];
}
He pointed out the dependency between iterations of the loop because line three sets a value that is used in the next iteration on line 2 (sets b[i+1]
which becomes b[i]
in the next iteration). Therefore we can't run each iteration of the loop in parallel.
He then gave us this unrolled version:
a[1] = a[1] + b[1];
for (int i = 0; i < 98; i++) {
b[i+1] = c[i] + d[i];
a[i+1] = a[i] + b[i];
}
b[99] = c[99] + d[99];
He claims that each iteration of the loop can now be run in parallel. The problem I see is that line 3 sets what will become b[i]
in the next iteration on line 4 and therefore we still can't run each iteration in parallel.
Am I right in saying that? If so, is there a properly unrolled version of the first loop where each iteration can be parallelized?