I am going to analyse and optimize some C-Code and therefore I first have to check, whether the functions I want to optimize are memory-bound or cpu-bound. In general I know, how to do this, but I have some questions about counting Floating Point Operations and analysing the size of data, which is used. Look at the following for-loop, which I want to analyse. The values of the array are doubles (that means 8 Byte each):
for(int j=0 ;j<N;j++){
for(int i=1 ;i<Nt;i++){
matrix[j*Nt+i] = matrix[j*Nt+i-1] * mu + matrix[j*Nt+i]*sigma;
}
}
1) How many floating point operations do you count? I thought about 3*(Nt-1)*N... but do I have to count the operations within the arrays, too (matrix[j*Nt+i], which are 2 more FLOP for this array)?
2)How much data is transfered? 2* ((Nt-1)*N)8Byte or 3 ((Nt-1)*N)*8Byte. I mean, every entry of the matrix has to be loaded. After the calculation, the new values is saved to that index of the array (now these is 1load and 1 store). But this value is used for the next calculation. Is another load operations needed therefore, or is this value (matrix[j*Nt+i-1]) already available without a load operation?
Thx a lot!!!