I'm following the algorithms provided by this paper by Andrew Thall describing algorithms for performing math using the df64 data type, a pair of 32-bit floating point numbers used to emulate the precision of a 64-bit floating point number. However, there appears to be some inconsistencies (mistakes?) in how they've written their Division and Square Root functions.
This is how the Division function is written in the paper:
float2 df64_div(float2 B, float2 A) {
float xn = 1.0f / A.x;
float yn = B.x * xn;
float diff = (df64_diff(B, df64_mult(A, yn))).x;
float2 prod = twoProd(xn, diffTerm);
return df64_add(yn, prodTerm);
}
The language used to write this code appears to be Cg, for reference, although you should be able to interpret this code in C++ if you treat float2
as though it's merely an alias for struct float2{float x, y;};
, with some extra syntax to support arithmetic operations directly on the type.
For reference, these are the headers of the functions being used in this code:
float2 df64_add(float2 a, float2 b);
float2 df64_mult(float2 a, float2 b);
float2 df64_diff(/*Not provided...*/);
float2 twoProd(float a, float b);
So a couple of problems immediately stand out:
diffTerm
andprodTerm
are never defined. There's two variables,diff
andprod
which are defined, but it's not certain that these are the terms that were intended in this code.- No declaration of
df64_diff
is provided. Presumably this is meant to support subtraction; but again, this is not clear. df64_mult
is a function that does not accept a 32-bit float as an argument; it only supports two pairs of 32-bit floats as arguments. It is not clear how the paper expects this function call to compile- Same also for
df64_add
, which also only accepts pairs of 32-bit floats as arguments, but is here invoked with the first argument being only a single 32-bit float.
I'm making an educated guess that this is a correct implementation of this code, but because even a correct implementation of this function has unavoidable errors in the computation, I can't tell if it's correct, even if it gives values that "seem" correct:
float2 df64_div(float2 B, float2 A) {
float xn = 1.0f / A.x;
float yn = B.x * xn;
float diff = (df64_diff(B, df64_mult(A, float2(yn, 0)))).x;
float2 prod = twoProd(xn, diff);
return df64_add(float2(yn, 0), prod);
}
float2 df64_diff(float2 a, float2 b) {
return df64_add(a, float2(-b.x, -b.y));
}
So my question is this: is the written implementation of this algorithm as seen in the paper accurate (because it depends on behavior of the Cg language that I'm not aware of?), or isn't it? And regardless, is my interpolation of that code a correct implementation of the division algorithm described in the paper?
Note: my target language is C++, so although the differences between the languages (for this kind of algorithm) are minor, my code is written in C++ and I'm looking for correctness for the C++ language.