Extended floating point precision on mobile GPU

Question

I'm trying to compute the gradient vector field of an image on the gpu using opengl-es 2.0. I found a cpu implementation for it which i use as a compare to my gpu implementation. The challenge here is that the cpu implementation relies on java type float (32 bits) whereas my gpu implementation is using lowp float (8 bits). I know i could use mediump or highp, to get better results but still i would like to keep on using lowp float to make sure my code will be able to run on the poorest possible hardware.

The first few steps for calculating the gradient vector field are very simple:

compute normalised greyscale (red+green+blue)/3.0
compute edge map (right pixel-left pixel)/2.0 and (up pixel-down pixel)/2.0
compute laplacian (a bit more complex but there is no need to get to the details of this now)

Currently, without doing anything fancy, i'm able to mimic exactly step 1 such that the image result from the cpu implementation is the same as the one from the gpu.

Unfortunately, i'm already stuck on step 2, because my edge map calculation is not accurate enough on the gpu.

So i've tried to implement an extended precision floating point, inspired from http://andrewthall.org/papers/df64_qf128.pdf .

I'm fairly new to opengl-es and so i'm not even sure i did things correctly here, but below are the operations i intended to code in order to work out this precision loss i'm currently suffering of.

    vec2 split(float a)
{
    float   t   =   a * (2e-8+1.0);
    float   aHi =   t - (t -a);
    float   aLo =   a - aHi;

    return vec2(aHi,aLo);
}

vec2 twoProd(float a, float b)
{
    float   p   = a * b;
    vec2    aS  = split(a);
    vec2    bS  = split(b);
    float   err = ( ( (aS.x * bS.x) - p) + (aS.x * bS.y) + (aS.y * bS.x) ) + (aS.y * bS.y);

    return vec2(p,err);
}

vec2 FMAtwoProd(float a,float b)
{
    float   x   =   a * b;
    float   y   =   a * b - x;

    return vec2(x,y);
}

vec2 div(vec2 a, vec2 b)
{
    float   q   = a.x / b.x;
    vec2    res = twoProd( q , b.x );
    float   r   = ( a.x - res.x ) - res.y ;

    return vec2(q,r);
}

vec2 div(vec2 a, float b)
{
    return div(a,split(b));
}

vec2 quickTwoSum(float a,float b)
{
    float   s   =   a + b;
    float   e   =   b - (s-a);

    return vec2(s,e);
}

vec2 twoSum(float a,float b)
{
    float   s   =   a + b;
    float   v   =   s - a;
    float   e   =   ( a - (s - v)) + ( b - v );

    return vec2(s,e);
}

vec2 add(vec2 a, vec2 b)
{
    vec2    s   =   twoSum(a.x , b.x);
    vec2    t   =   twoSum(a.y , b.y);

    s.y     +=  t.x;
    s       =   quickTwoSum(s.x,s.y);
    s.y     +=  t.y;
    s       =   quickTwoSum(s.x,s.y);

    return s;
}

vec2 add(vec2 a,float b)
{
    return add(a,split(b));
}

vec2 mult2(vec2 a,vec2 b)
{
    vec2    p   =   twoProd(a.x,b.x);
    p.y     +=  a.x * b.y;
    p.y     +=  a.y * b.x;
    p       =   quickTwoSum(p.x,p.y);

    return p;
}

vec2 mult(vec2 a,float b)
{
    return mult2(a, split(b));
}

Obviously, i must be doing something wrong here or miss some quite fundamental concepts as i'm getting the same results whether i'm using simple operations or my extended floating point operations...

score 0 · Answer 1 · answered Feb 28 '15 at 21:33

The challenge here is that the cpu implementation relies on java type float (32 bits) whereas my gpu implementation is using lowp float (8 bits).

lowp does not actually imply the number of bits used for floating-point arithmetic. It is more to do with the range of values that must be expressible and the minimum distinguishable value (precision) - you can use this to figure out a minimum number of bits, but GLSL never discusses it as such.

Currently, without doing anything fancy, i'm able to mimic exactly step 1 such that the image result from the cpu implementation is the same as the one from the gpu.

That is lucky, because an immediate problem in your description comes from the fact that lowp is only guaranteed to represent values in the range [-2.0,2.0]. If you try to normalize a low-precision floating-point value by dividing it by 3 (as shown in step 1), that may or may not work. In the worst-case this will not work because the floating-point value will never reach 3.0. However, on some GPUs it may work because there may be no difference between lowp and mediump or a GPU's lowp may exceed the minimum requirements outlined in 4.5.2 Precision Qualifiers of the GLSL ES 1.00 specification.

... still I would like to keep on using lowp float to make sure my code will be able to run on the poorest possible hardware.

If you are targeting the lowest-end hardware possible, keep in mind that ES 2.0 requires mediump support in all shader stages. The only thing lowp might get you is improved performance on some GPUs, but any GPU that can host ES 2.0 is one that supports medium precision floating-point and your algorithm is one that requires a range greater than lowp guarantees.

Extended floating point precision on mobile GPU

1 Answers1