I'm doing some studies on a cache model and wanted to avoid having to write tedious ARM ASM. So I'm trying to get GCC to craft the exact code I want. I have the following loop:
for(int k = 0; k < 3; k++) {
if(k == 0) {
my_buf = buf0;
} else {
my_buf = buf1;
}
for(int i = 0; i < my_buf_size; i++) {
data = my_buf[i] ^ data;
}
}
This will run bare metal so no buffer allocations, etc. are needed. I just want the first iteration of k
to fill all of the instructions into I$ and the second iteration of k
to completely miss in D$ (load another buffer). I do want the compiler to unroll the i
loop for me though so that I can saturate loads without having a branch every iteration. I don't want the compiler to unroll the k
loop as I want the first iteration to bring all of the code into I$.
Is there a way to #pragma a specific loop not to be unrolled? Or disable unrolling of outer loops?