0

Suppose we have the set of floating point number with "m" bit mantissa and "e" bits for exponent. Suppose more over we want to approximate a function "f".

From the theory we know that usually a "range reduced function" is used and then from such function we derive the global function value.

For example let x = (sx,ex,mx) (sign exp and mantissa) then... log2(x) = ex + log2(1.mx) so basically the range reduced function is "log2(1.mx)".

I have implemented at present reciprocal, square root, log2 and exp2, recently i've started to work with the trigonometric functions. But i was wandering if given a global error bound (ulp error especially) it is possible to derive an error bound for the range reduced function, is there some study about this kind of problem? Speaking of the log2(x) (as example) i would lke to be able to say...

"ok i want log2(x) with k ulp error, to achieve this given our floating point system we need to approximate log2(1.mx) with p ulp error"

Remember that as i said we know we are working with floating point number, but the format is generic, so it could be the classic F32, but even for example e=10, m = 8 end so on.

I can't actually find any reference that shows such kind of study. Reference i have (i.e. muller book) doesn't treat the topic in this way so i was looking for some kind of paper or similar. Do you know any reference?

I'm also trying to derive such bound by myself but it is not easy...

user8469759
  • 2,522
  • 6
  • 26
  • 50

1 Answers1

0

There is a description of current practice, along with a proposed improvement and an error analysis, at https://hal.inria.fr/ensl-00086904/document. The description of current practice appears consistent with the overview at https://docs.oracle.com/cd/E37069_01/html/E39019/z4000ac119729.html, which is consistent with my memory of the most talked about problem being the mod pi range reduction of trigonometric functions.

I think IEEE floating point was a big step forwards just because it standardized things at a time when there were a variety of computer architectures, so lowering the risks of porting code between them, but the accuracy requirements implied by this may have been overkill: for many problems the constraint on the accuracy of the output is the accuracy of the input data, not the accuracy of the calculation of intermediate values.

mcdowella
  • 19,301
  • 2
  • 19
  • 25
  • Is it my impression or this paper just make an example with hanek payne algorithm? But anyway the idea should be to derive the worst case for error reduction and derive the worst error dued to the approximation? – user8469759 Jun 16 '15 at 10:03
  • Section 2 describes their new idea, although they revert back to Payne and Hanek for some inputs. For the case of trigonometric functions, where the reduction is mod pi, they seem to be saying that they can get an accurate range reduction (section 1.Introduction) which I think means that they use enough precision to get the same reduction as an arbitrary precision arithmetic package would get. When they are talking about a worst case they mean the highest amount of work required to get the right result, not a worst error. So only the calculation on the reduced value contributes to the error. – mcdowella Jun 16 '15 at 18:50