Let say I've to multiply two array such as A[MAX_BUFFER]
and B[MAX_BUFFER]
(with a MAX_BUFFER = 256
).
For some reason, each B[MAX_BUFFER]
values are calculated at fixed control rate (8
, for example), since each values would be heavy processed.
Later, I need to multiply each others to C[MAX_BUFFER]
, considering the (introduced) different spacing. So with A
on 256 values, I'll got a B
with variable size (32 in this example, since control rate is 8).
Here's an example code:
#include <iostream>
#include <math.h>
#define MAX_BUFFER 256
double HeavyFunction(double value) {
if (value == 0) return 0.0;
return pow(10.0, value); // heavy operations on value...
}
int main()
{
int blockSize = 256;
int controlRate = 8;
double A[MAX_BUFFER];
double B[MAX_BUFFER];
double C[MAX_BUFFER];
// fill A
for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) {
A[sampleIndex] = sampleIndex;
}
// fill B (control rated)
int index = 0;
for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex += controlRate, index++) {
B[index] = HeavyFunction(index);
}
// calculate C
for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) {
C[sampleIndex] = A[sampleIndex] + B[sampleIndex / 8];
std::cout << C[sampleIndex] << std::endl;
}
}
I need performance, since I'll process lots of those operations in parallel, sending many data in 1 seconds (somethings like 44100 samples splitted in blockSize
<= MAX_BUFFER
).
I'd like to avoid branch (i.e. if
) and division (as in the example above), which are not CPU-like operations (processing a big amount of data).
In the example before, this will introduce sampleIndex / 8 * N
"futile" N operation; things if I call that procedure for millions samples...
How would you refactor this code in a fancy and light way for CPU?