This is the code I actually had (for a scalar code) which I've replicated (x4) storing data into simd:
waveTable *waveTables[4];
for (int i = 0; i < 4; i++) {
int waveTableIindex = 0;
while ((phaseIncrement[i] >= mWaveTables[waveTableIindex].mTopFreq) && (waveTableIindex < kNumWaveTableSlots)) {
waveTableIindex++;
}
waveTables[i] = &mWaveTables[waveTableIindex];
}
Its not "faster" at all, of course. How would you do the same with simd, saving cpu? Any tips/starting point? I'm with SSE2.
Here's the context of the computation. topFreq for each wave table are calculated starting from the max harmonic amounts (x2, due to Nyquist), and multiply for 2 on every wave table (dividing later the number of harmonics available for each table):
double topFreq = 1.0 / (maxHarmonic * 2);
while (maxHarmonic) {
// fill the table in with the needed harmonics
// ... makeWaveTable() code
// prepare for next table
topFreq *= 2;
maxHarmonic >>= 1;
}
Than, on processing, for each sample, I need to "catch" the correct wave table to use, due to the osc's freq (i.e. phase increment):
freq = clamp(freq, 20.0f, 22050.0f);
phaseIncrement = freq * vSampleTime;
so, for example (having vSampleTime = 1/44100, maxHarmonic = 500), 30hz is wavetable 0, 50hz is wavetable 1, and so on