0

Have this inline double code from a PolyBlep oscillator for making a synthesizer. I was wondering if I could make it more efficient maybe using intrinsic replacements or just refactoring the code so that the compiler can automatically apply intrinsic when it compiles. Any other methods that are not using vector just plain refactoring would be fine just some speed improvement as it is a little taxing thank you!

inline double blep(double t, double dt) {
 if (t < dt) {
     return -square_number(t / dt - 1);
 }
 else if (t > 1 - dt) {
     return square_number((t - 1) / dt + 1);
 }
 else {
     return 0;
 }
}

It uses a lot of subtract and divide but also some logic wondering if there is a way to speed this up a little for better cpu savings?

This is using C++ type code in Visual Studio 2019 c++17. Any suggestion would be appreciated thank you!

Source Code https://github.com/martinfinke/PolyBLEP

EDIT: t and dt are varible non static incoming values of phase position (t) and freq/pitch (dt)

Davdson
  • 21
  • 6
  • 2
    You'd have to change the caller to do much, e.g. doing 2 or 4 (AVX) bleps at once on different `t` values with intrinsics. And/or have the caller pass a reciprocal of `dt` so you're multiplying instead of dividing. – Peter Cordes Jul 02 '21 at 17:22

1 Answers1

2

Probably the easiest and most effective optimization is to replace the division by a multiplication with the inverse (assuming dt does not change). But you can also avoid all the branches.

Notice that your function is point-symmetric around t=0.5 (assuming dt<0.5), i.e. f(0.5-x) = -f(0.5+x), and the expression inside the square function can be rewritten as (abs(t-0.5)-0.5)/dt+1) (since square(-x)=square(x)).

Now if and only both branches fail then

--> dt<t && t<1-dt
--> dt-0.5 < t-0.5 < 0.5-dt
--> abs(t-0.5) < 0.5-dt
--> abs(t-0.5) - 0.5 < -dt
--> (abs(t-0.5) - 0.5)/dt < -1
--> (abs(t-0.5) - 0.5)/dt + 1 < 0

That means, we can write max((abs(t-0.5) - 0.5)/dt + 1, 0) instead (squaring 0 is still 0, of course) and summarize:

bleb(t,dt) = sign(t-0.5)*square(max(0,(abs(t-0.5)-0.5)/dt+1))

or with C++:

double s = t-0.5;
return std::copysign(square_number(std::max(0.0, (std::abs(s)-0.5)*(1.0/dt)+1.0)), s);

Calculation of 1/dt should of course be factored out (your compiler might be able to do this), and copysign as well as abs should compile to some simple bit-twiddling operations (check the generated assembly for your compiler).

All operations can be vectorized without problems, but you probably need to refactor your surrounding code to do that.

chtz
  • 17,329
  • 4
  • 26
  • 56
  • Thanks chtz for the response! t and dt would be a variable and would change at audio rate speeds. Instance Variables * t Example: phase = osc.t; The current phase [0.0..1.0) of the oscillator. * dt Example: freq = osc.dt * srate; The oscillator frequency, in seconds/sample. – Davdson Jul 05 '21 at 16:06
  • I tried refactoring to intrinsic AVX2 and SSE2 but I am unfamiliar right now with NEON and I need to find a library like the Agnor Fog for NEON. ARM is presenting a challenge there seems to be 2 options out there but they don't work exactly like the Agnor Fog lib which is fairly simple to rewrite the code for. – Davdson Jul 05 '21 at 16:12
  • There may be a third solution it seems I can write regular C++ code in a way the compiler will apply intrinsic boost for NEON and AVX2 etc depending on OS/CPU compiled for. I have seen a couple of examples of unrolling loops it is also a bit foreign to me but seems maybe these easiest in the long run. No library and no multiple sets of code hope this can be a solution possibly. – Davdson Jul 05 '21 at 16:16
  • 1
    As long as `dt` does not change for every call to your function, it is worth factoring out the division. And yes, if you have code which is simple enough to vectorize and compile with `-O3` (and `-march=native`) gcc and clang will auto-vectorize it (check the assembly your compiler generates). – chtz Jul 05 '21 at 17:55
  • dt would represent pitch so if you change the note the pitch would change not every call but some when pitch is changed. – Davdson Jul 09 '21 at 16:37