Assume I have to write a C or C++ computational intensive function that has 2 arrays as input and one array as output. If the computation uses the 2 input arrays more often than it updates the output array, I'll end up in a situation where the output array seldom gets cached because it's evicted in order to fetch the 2 input arrays.
I want to reserve one fraction of the cache for the output array and enforce somehow that those lines don't get evicted once they are fetched, in order to always write partial results in the cache.
Update1(output[]) // Output gets cached
DoCompute1(input1[]); // Input 1 gets cached
DoCompute2(input2[]); // Input 2 gets cached
Update2(output[]); // Output is not in the cache anymore and has to get cached again
...
I know there are mechanisms to help eviction: clflush, clevict, _mm_clevict, etc. Are there any mechanisms for the opposite?
I am thinking of 3 possible solutions:
- Using _mm_prefetch from time to time to fetch the data back if it has been evicted. However this might generate unnecessary traffic plus that I need to be very careful to when to introduce them;
- Trying to do processing on smaller chunks of data. However this would work only if the problem allows it;
- Disabling hardware prefetchers where that's possible to reduce the rate of unwanted evictions.
Other than that, is there any elegant solution?