Fast inverse arbitrary power root algorithm implementation

Question

Many sources indicate that well-known fast inverse square root algorithm can be generalized to calculation arbitrary power inverse root. Unfortunately I have not found such C++ implementation and I'm not so good at math to generalize this method by myself. Could you help me to do this or perhaps provide a ready-made solution? I think this will be useful to many, especially with good explanations.

This is the original algorithm and I do not quite understand what I need to change to get for example 1 /cbrt(x):

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // what the...? 
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//  y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

    return y;
}

You need to change the calculation of the initial approximation (magic constant) and construct the corresponding to power newton steps. — Dmytro Dadyka, Mar 04 '19 at 20:04
For some values of `n` (including reciprocal cube root) I gave optimal low-accuracy starting approximations in [this question](https://stackoverflow.com/questions/32042673/optimized-low-accuracy-approximation-to-rootnx-n). — njuffa, Mar 05 '19 at 18:07

Dmytro Dadyka · Accepted Answer · 2019-03-05T00:00:01.843

The algorithm consists of two steps - a rough solution estimation and the solution improvement using several Newton method steps.

Rough estimation

The basic idea is to use relationship between float number logarithm log2(x) and its integer representation Ix:

$I_{x}= Llog_{2}(x)+L(B-\sigma); log_{2}(x)\approx \frac{I_{x}}{L}-(B-\sigma)$

(Image from https://en.wikipedia.org/wiki/Fast_inverse_square_root)

Now use the well-known logarithm identity for root:

$log_{2}(\frac{1}{\sqrt[n]{x}})=-\frac{1}{n}log_{2}(x)$ .

Combining the identities obtained earlier, we get:

$\frac{I_{y}}{L}-(B-\sigma )=-\frac{1}{c}(\frac{I_{x}}{L}-(B-\sigma ))$

$I_{y}= \frac{(n+1)L(B-\sigma)}{n} -\frac{I_{x}}{n}$

Substituting numerical values L * (B - s) = 0x3F7A3BEA, so

Iy = 0x3F7A3BEA / c * (c + 1) - Ix / c;.

For a simple float point number representation as an integer and back it is convenient to use union type:

   union 
   { 
      float f; // float representation
      uint32_t i; // integer representation
   } t;

   t.f = x;
   t.i = 0x3F7A3BEA / n * (n + 1) - t.i / n; // Work with integer representation
   float y = t.f; // back to float representation

Note that for n=2 the expression is simplified to t.i = 0x5f3759df - t.i / 2; which is identical to original i = 0x5f3759df - ( i >> 1 );

Newton's solution improvement

Transform equality

into an equation that should be solved:

$x=\frac{1}{y^{n}};f(y)=\frac{1}{y^{n}}-x;f(y)=0$

Now construct Newton steps:

${f(y)}'=-\frac{n}{y^{n+1}}$

$y_{i+1}=y_{i}-f(y_{i})/{f}'(y_{i})=y_{i}+\frac{(\frac{1}{y^{n}_{i}}-x)y^{n+1}_{i}}{n}=\frac{y_{i}(1+n-xy_{i}^{n})}{n}$

Programmatically it looks like: y = y * (1 + n - x * pow(y,n)) / n;. As an initial y, we use the value obtained at Rough estimation step.

Note for the particular case of square root (n = 2) we get y = y * (3 - x*y*y) / 2; which is identically to the original formula y = y * (threehalfs - (x2 * y * y));

Final code as template function. Parameter N determines root power.

template<unsigned N>
float power(float x) {
   if (N % 2 == 0) return power<N / 2>(x * x);
   else if (N % 3 == 0) return power<N / 3>(x * x * x);
   return power<N - 1>(x) * x;
};

template<>
float power<0>(float x){ return 1; }

// fast_inv_nth_root<2>(x) - inverse square root 1/sqrt(x)
// fast_inv_nth_root<3>(x) - inverse cube root 1/cbrt(x)

template <unsigned n>
float fast_inv_nth_root(float x)
{
   union { float f; uint32_t i; } t = { x };

   // Approximate solution
   t.i = 0x3F7A3BEA / n * (n + 1) - t.i / n;
   float y = t.f;

   // Newton's steps. Copy for more accuracy.
   y = y * (n + 1 - x * power<n>(y)) / n;
   y = y * (n + 1 - x * power<n>(y)) / n;
   return y;
}

Testing

Testing code:

int main()
{
   std::cout << "|x          ""|fast2      "" actual2    "
      "|fast3      "" actual3    "
      "|fast4      "" actual4    "
      "|fast5      "" actual5    ""|" << std::endl;

   for (float i = 0.00001; i < 10000; i *= 10)
      std::cout << std::setprecision(5) << std::fixed
      << std::scientific << '|'
      << i << '|'
      << fast_inv_nth_root<2>(i) << " " << 1 / sqrt(i) << "|"
      << fast_inv_nth_root<3>(i) << " " << 1 / cbrt(i) << "|"
      << fast_inv_nth_root<4>(i) << " " << pow(i, -0.25) << "|"
      << fast_inv_nth_root<5>(i) << " " << pow(i, -0.2) << "|"
      << std::endl;
}

Results:

|x          |fast2       actual2    |fast3       actual3    |fast4       actual4    |fast5       actual5    |
|1.00000e-05|3.16226e+02 3.16228e+02|4.64152e+01 4.64159e+01|1.77828e+01 1.77828e+01|9.99985e+00 1.00000e+01|
|1.00000e-04|9.99996e+01 1.00000e+02|2.15441e+01 2.15443e+01|9.99991e+00 1.00000e+01|6.30949e+00 6.30957e+00|
|1.00000e-03|3.16227e+01 3.16228e+01|1.00000e+01 1.00000e+01|5.62339e+00 5.62341e+00|3.98103e+00 3.98107e+00|
|1.00000e-02|9.99995e+00 1.00000e+01|4.64159e+00 4.64159e+00|3.16225e+00 3.16228e+00|2.51185e+00 2.51189e+00|
|1.00000e-01|3.16227e+00 3.16228e+00|2.15443e+00 2.15443e+00|1.77828e+00 1.77828e+00|1.58487e+00 1.58489e+00|
|1.00000e+00|9.99996e-01 1.00000e+00|9.99994e-01 1.00000e+00|9.99991e-01 1.00000e+00|9.99987e-01 1.00000e+00|
|1.00000e+01|3.16226e-01 3.16228e-01|4.64159e-01 4.64159e-01|5.62341e-01 5.62341e-01|6.30948e-01 6.30957e-01|
|1.00000e+02|9.99997e-02 1.00000e-01|2.15443e-01 2.15443e-01|3.16223e-01 3.16228e-01|3.98102e-01 3.98107e-01|
|1.00000e+03|3.16226e-02 3.16228e-02|1.00000e-01 1.00000e-01|1.77827e-01 1.77828e-01|2.51185e-01 2.51189e-01|
|1.00000e+04|9.99996e-03 1.00000e-02|4.64155e-02 4.64159e-02|9.99995e-02 1.00000e-01|1.58487e-01 1.58489e-01|

I don't think you can insert Latex on SO. Maybe screenshot, crop your math formula and upload it as an image. Or use Google Chart API as directed in this thread: https://meta.stackexchange.com/questions/76902/how-can-i-write-math-formula-in-a-post — tnt, Mar 04 '19 at 21:35
@DmytroDadyka you can't. mathexchange, stats etc supports MathJax, but not stackoverflow - they think it is too much of a burden for rarely used feature — Severin Pappadeux, Mar 04 '19 at 22:43

Fast inverse arbitrary power root algorithm implementation

1 Answers1

Rough estimation

Newton's solution improvement

Testing