I'm trying to compute the gradient vector field of an image on the gpu using opengl-es 2.0. I found a cpu implementation for it which i use as a compare to my gpu implementation. The challenge here is that the cpu implementation relies on java type float (32 bits) whereas my gpu implementation is using lowp float (8 bits). I know i could use mediump or highp, to get better results but still i would like to keep on using lowp float to make sure my code will be able to run on the poorest possible hardware.
The first few steps for calculating the gradient vector field are very simple:
- compute normalised greyscale (red+green+blue)/3.0
- compute edge map (right pixel-left pixel)/2.0 and (up pixel-down pixel)/2.0
- compute laplacian (a bit more complex but there is no need to get to the details of this now)
Currently, without doing anything fancy, i'm able to mimic exactly step 1 such that the image result from the cpu implementation is the same as the one from the gpu.
Unfortunately, i'm already stuck on step 2, because my edge map calculation is not accurate enough on the gpu.
So i've tried to implement an extended precision floating point, inspired from http://andrewthall.org/papers/df64_qf128.pdf .
I'm fairly new to opengl-es and so i'm not even sure i did things correctly here, but below are the operations i intended to code in order to work out this precision loss i'm currently suffering of.
vec2 split(float a)
{
float t = a * (2e-8+1.0);
float aHi = t - (t -a);
float aLo = a - aHi;
return vec2(aHi,aLo);
}
vec2 twoProd(float a, float b)
{
float p = a * b;
vec2 aS = split(a);
vec2 bS = split(b);
float err = ( ( (aS.x * bS.x) - p) + (aS.x * bS.y) + (aS.y * bS.x) ) + (aS.y * bS.y);
return vec2(p,err);
}
vec2 FMAtwoProd(float a,float b)
{
float x = a * b;
float y = a * b - x;
return vec2(x,y);
}
vec2 div(vec2 a, vec2 b)
{
float q = a.x / b.x;
vec2 res = twoProd( q , b.x );
float r = ( a.x - res.x ) - res.y ;
return vec2(q,r);
}
vec2 div(vec2 a, float b)
{
return div(a,split(b));
}
vec2 quickTwoSum(float a,float b)
{
float s = a + b;
float e = b - (s-a);
return vec2(s,e);
}
vec2 twoSum(float a,float b)
{
float s = a + b;
float v = s - a;
float e = ( a - (s - v)) + ( b - v );
return vec2(s,e);
}
vec2 add(vec2 a, vec2 b)
{
vec2 s = twoSum(a.x , b.x);
vec2 t = twoSum(a.y , b.y);
s.y += t.x;
s = quickTwoSum(s.x,s.y);
s.y += t.y;
s = quickTwoSum(s.x,s.y);
return s;
}
vec2 add(vec2 a,float b)
{
return add(a,split(b));
}
vec2 mult2(vec2 a,vec2 b)
{
vec2 p = twoProd(a.x,b.x);
p.y += a.x * b.y;
p.y += a.y * b.x;
p = quickTwoSum(p.x,p.y);
return p;
}
vec2 mult(vec2 a,float b)
{
return mult2(a, split(b));
}
Obviously, i must be doing something wrong here or miss some quite fundamental concepts as i'm getting the same results whether i'm using simple operations or my extended floating point operations...