The result of own double precision cos() implemention in a shader is NaN, but works well on the CPU. What is going wrong?

Question

as i said, i want implement my own double precision cos() function in a compute shader with GLSL, because there is just a built-in version for float.

This is my code:

double faculty[41];//values are calculated at the beginning of main()

double myCOS(double x)
{
    double sum,tempExp,sign;
    sum = 1.0;
    tempExp = 1.0;
    sign = -1.0;

    for(int i = 1; i <= 30; i++)
    {
        tempExp *= x;
        if(i % 2 == 0){
            sum = sum + (sign * (tempExp / faculty[i]));
            sign *= -1.0;
        }
    }
return sum;
}

The result of this code is, that the sum turns out to be NaN on the shader, but on the CPU the algorithm is working well. I tried to debug this code too and I got the following information:

faculty[i] is positive and not zero for all entries
tempExp is positive in each step
none of the other variables are NaN during each step
the first time sum is NaN is at the step with i=4

and now my question: What exactly can go wrong if each variable is a number and nothing is divided by zero especially when the algorithm works on the CPU?

infinity divided by infinity, for example. What's your input data? — Wintermute, Mar 05 '15 at 11:53
@Wintermute I am not really sure about that and it is hard to estimate exact values in a shader. A part of it is positive and the rest is negative. But I will try to estimate them. — DanceIgel, Mar 05 '15 at 12:38
@Wintermute Ok. It seems that the values are really big. So it will just reach the border when multplied with itself. — DanceIgel, Mar 05 '15 at 13:42
@Dancelgel: GLSL is a little weird in that the precision and range of built-in floating-point functions is allowed to vary. You should review pp. 83-84 of the [GLSL 4.50 specification](https://www.opengl.org/registry/doc/GLSLangSpec.4.50.pdf#page=89) for more details. Support for NaN is not even required by GLSL, but pretty much all DX11 hardware (and by extension, anything with double-precision support) has it. — Andon M. Coleman, Mar 06 '15 at 14:25
If the algorithm you are using is stable it should not explode even with lower precision on GPU side (certain algorithms will start behave strangely once you reach wanted precision. I.E. try to use "float" and 80 iterations to see what happens on CPU). The problem may be fixed point precision or reduced precision on GPU or also the algorithm itself. Have you tried running the algorithm both in a Vertex and in a fragment shader? you can also use pre-processors to force precision on GPU. And make sure you align data you pass to GPU. Also if you want to write plain C use OpenCL/CUDA not GLSL — CoffeDeveloper, Mar 07 '15 at 12:56
Have you used pen and paper to calculate the values each time round the loop? Do this for until you hit `i=4` and then see what the output is. Check that this value can be represented in your data type. — ChrisF, Apr 16 '15 at 12:51
Giving sample values of faculty, or the code that initializes it, will greatly help. — Yakov Galka, May 24 '15 at 17:12
What is the value of x that is causing the NaN? The first step in a cos algorithm would be to get x within the range -pi <= x <= pi to avoid these problems. — 1201ProgramAlarm, Sep 26 '15 at 01:08
Why don't you compute this a polynomial in x^2 using Horner's scheme? — Walter, Oct 29 '15 at 20:37
@Dancelgel I don't want to rain on your parade, but unless you are doing this for your own formation or out of curiosity, odds are that just the function for sin/cos computation will eat most (if not all) the performances benefit of GPU outsourcing... CUDA/OpenCL/DirectCompute should support double precision operations and are probably better candidates for whatever you are doing, if you can afford them. — Rick77, Feb 17 '16 at 08:44
@Rick77 But it is okay for my purpose. I used it to show that an algorithm is less efficient than another implemented with a compute shader. — DanceIgel, Feb 17 '16 at 10:32

fjardon · Answer 1 · 2016-02-17T08:20:14.640

Let me guess:

First you determined the problem is in the loop, and you use only the following operations: +, *, /.

The rules for generating NaN from these operations are:

The divisions 0/0 and ±∞/±∞
The multiplications 0×±∞ and ±∞×0
The additions ∞ + (−∞), (−∞) + ∞ and equivalent subtractions

You ruled out the possibility for 0/0 and ±∞/±∞ by stating that faculty[] is correctly initialized.

The variable sign is always 1.0 or -1.0 so it cannot generate the NaN through the * operation.

What remains is the + operation if tempExp ever become ±∞.

So probably tempExp is too high on entry of your function and becomes ±∞, this will make sum to be ±∞ too. At the next iteration you will trigger the NaN generating operation through: ∞ + (−∞). This is because you multiply one side of the addition by sign and sign switches between positive and negative at each iteration.

You're trying to approximate cos(x) around 0.0. So you should use the properties of the cos() function to reduce your input value to a value near 0.0. Ideally in the range [0, pi/4]. For instance, remove multiples of 2*pi, and get the values of cos() in [pi/4, pi/2] by computing sin(x) around 0.0 and so on.

score 0 · Answer 2 · answered Nov 13 '15 at 13:30

What can go dramatically wrong is a loss of precision. cos(x) usually is implemented by range reduction followed by a dedicated implementation for the range [0, pi/2]. Range reduction uses cos(x+2*pi) = cos(x). But this range reduction isn't perfect. For starters, pi cannot be exactly represented in finite math.

Now what happens if you try something as absurd as cos(1<<30) ? It's quite possible that the range reduction algorithm introduces an error in x that's larger than 2*pi, in which case the outcome is meaningless. Returning NaN in such cases is reasonable.

The result of own double precision cos() implemention in a shader is NaN, but works well on the CPU. What is going wrong?

2 Answers2