I am wondering how to make sure I take advantage of cpu pipelining in the following audio code:
int sample_count = 100;
// volume array - value to multiply audio sample by
double volume[4][4];
// fill volume array with values here
// audio sample array - really this is 125 samples by 16 channels but smaller here for clarity
double samples[sample_count][4];
// fill samples array with audio samples here
double tmp[4];
for (x=0;x<sample_count;x++) {
tmp[0] = samples[x][0]*volume[0][0] + samples[x][1]*volume[1][0] + samples[x][2]*volume[2][0] + samples[x][3]*volume[3][0];
tmp[1] = samples[x][0]*volume[0][1] + samples[x][1]*volume[1][1] + samples[x][2]*volume[2][1] + samples[x][3]*volume[3][1];
tmp[2] = samples[x][0]*volume[0][2] + samples[x][1]*volume[1][2] + samples[x][2]*volume[2][2] + samples[x][3]*volume[3][2];
tmp[3] = samples[x][0]*volume[0][3] + samples[x][1]*volume[1][3] + samples[x][2]*volume[2][3] + samples[x][3]*volume[3][3];
samples[x][0] = tmp[0];
samples[x][1] = tmp[1];
samples[x][2] = tmp[2];
samples[x][3] = tmp[3];
}
// write sample array out to hardware here.
In case its not immediately clear this mixes the 4 input channels via a 4x4 matrix of volume controls into 4 output channels.
I'm actually executing this quite a lot more intensively than the above example and I am not sure how to tailor my code to take advantage of pipelining (which this seems suitable for). Should I perhaps work on one 'channel' of the samples array at a time, so that the same value can be operated on several times (for sequential samples of the same channel)? That way however I will have to check x for > sample_count 4 times as many times. I could make tmp 2 dimensional and large enough to hold the full buffer, if working through it in this way would make the cpu pipeline efficiently. Or will the above code pipeline efficiently? Is there an easy way to check whether pipelining is happening? TIA.