How can i optimize this S-curve function?

Question

I am working on a gamma function that generates a "S-Curve". I need to run it in a realtime environment so i need to speed it up as much as possible.

The code is as follows:

float Gamma = 2.0f; //Input Variable

float GammaMult = pow(0.5f, 1.0f-Gamma);
if(Input<1.0f && Input>0.0f)
{
    if(Input<0.5f)
    {
        Output = pow(Input,Gamma)*GammaMult;
    }
    else
    {
        Output  = 1.0f-pow(1.0f-Input,Gamma)*GammaMult;
    }
}
else
{
   Output  = Input;
}

Is there any way I can optimize this code?

float Gamma = 2.0f; //Input Variable should be dynamic of course, right? — Humam Helfawi, Jan 18 '16 at 12:28

Tamás Zahola · Accepted Answer · 2016-01-18T13:05:58.097

3

You can avoid pipeline stalls by eliminating branching on Input<1.0f && Input>0.0f if the instruction set supports saturation arithmetic or use max/min intrinsics, e.g. x86 MAXSS

You should also eliminate the other branching via rounding the saturated Input. Full algorithm:

float GammaMult = pow(0.5f, 1.0f-Gamma);
Input = saturate(Input); // saturate via assembly or intrinsics
// Input is now in [0, 1]
Rounded = round(Input); // round via assembly or intrinsics
Coeff = 1 - 2 * Rounded
Output = Rounded + Coeff * pow(Rounded + Coeff * Input,Gamma)*GammaMult;

Rounding should be done via asm/intrinsics as well.

If you use this function on e.g. successive values of an array you should consider vectorising it if the target architecture supports SIMD.

edited Jan 18 '16 at 13:05

answered Jan 18 '16 at 12:49

Tamás Zahola

9,271
4
34
46

What is the advantage of rounding here? The original code doesn't appear to want the result rounded to an integer, and you're not eking out any more performance by avoiding floating-point ops in favor of integer ops as long as you use the `FRNDINT` instruction since that leaves the result on the floating point stack. – Cody Gray - on strike Jan 18 '16 at 13:04
@CodyGray rounding is used to generate the coefficients so that he won't need to branch on `Input < 0.5`. E.g.: `Coeff = 1 - 2 * Rounded` will be 1 if `Input < 0.5` and -1 if `Input > 0.5`, thus what was a branch in the original algorithm now becomes one round instruction, one floating point multiple and one floating point add --> pipelining won't suffer. – Tamás Zahola Jan 18 '16 at 13:07
Oh, of course! Very clever. I had missed that in my cursory reading. – Cody Gray - on strike Jan 18 '16 at 13:09
1

SIMDed `pow` is not very ubiquitous outside of high end compilers. – Rotem Jan 18 '16 at 13:10

score 0 · Answer 2 · edited May 23 '17 at 11:59

0

Your code seems fine. The bottleneck, if exists, is the pow function. The only solution is to go a bit deeper into low-level details and try to implement your own pow function. For example if 2 float digits are sufficient for you, you may found some approximation-based algorithms which are faster.

See this: The most efficient way of implementing pow() function in floating point

edited May 23 '17 at 11:59

Community

1
1

answered Jan 18 '16 at 12:40

Humam Helfawi

19,566
15
85
160

1

Another good reference for implementing a custom optimized pow function is here: http://stackoverflow.com/a/16782797 – Cody Gray - on strike Jan 18 '16 at 13:14

How can i optimize this S-curve function?

2 Answers2