I'd like to know if someone has experience in writing a HAL AudioUnit rendering callback taking benefits of multi-core processors and/or symmetric multiprocessing?
My scenario is the following:
A single audio component of sub-type kAudioUnitSubType_HALOutput (together with its rendering callback) takes care of additively synthesizing n sinusoid partials with independent individually varying and live-updated amplitude and phase values. In itself it is a rather straightforward brute-force nested loop method (per partial, per frame, per channel).
However, upon reaching a certain upper limit for the number of partials "n", the processor gets overloaded and starts producing drop-outs, while three other processors remain idle.
Aside from general discussion about additive synthesis being "processor expensive" in comparison to let's say "wavetable", I need to know if this can be resolved right way, which involves taking advantage of multiprocessing on a multi-processor or multi-core machine? Breaking the rendering thread into sub-threads does not seem the right way, since the render callback is already a time-constraint thread in itself, and the final output has to be sample-acurate in terms of latency. Has someone had positive experience and valid methods in resolving such an issue?
System: 10.7.x
CPU: quad-core i7
Thanks in advance,
CA