Instruction FYL2XP1

Question

I'm wondering why the instruction FYL2XP1 on x86-architecture computes exactly the mathematical formula y · log₂(x + 1).

What's special with this formula?

Why would you untag x87 on a question about an x87 FPU instruction? That's the most specific and relevant tag here. — Peter Cordes, Feb 19 '20 at 19:50
this is useful so the C++ standard library also includes it: [std::log1p](https://en.cppreference.com/w/cpp/numeric/math/log1p) and [std::expm1](https://en.cppreference.com/w/cpp/numeric/math/expm1) — phuclv, Jun 05 '22 at 12:41

score 10 · Accepted Answer · edited Jun 05 '22 at 12:51

10

The y operand is usually a compile time constant, for the moment forget about the x + 1.

Since log_b(x) = log_b(2) * log_2(x) the instruction allows to compute the logarithm in any base of x + 1.
Note that log_b(2) is a constant since it is seldom necessary to compute the logarithm with a degree of freedom in the base.

FYL2XP1 and FYL2X are the only two x87 instructions that compute the logarithm.
If the logarithm was an algebraic function a single instruction would suffice, but since it is transcendental, Intel provided two versions.

FYL2X works on the full domain of the logarithm but it is not perfectly accurate over this full range, particularly for very small values of x (and probably slower, I believe it has to do a range reduction, use a truncated Taylor expansion or a Padé approximation and then improve the accuracy with a table lookup).

FYL2XP1 instead works only for input in the small range ±( 1 – sqrt(2) ⁄ 2 ).
This should be faster (because of no range reduction) and more importantly, for the given input range, the approximation method used should have a precision equal or greater than the x87 80-bit floating point precision.

This instruction provides optimal accuracy for values of epsilon [the value in register ST(0)] that are close to 0. For small epsilon (ε) values, more significant digits can be retained by using the FYL2XP1 instruction than by using (ε+1) as an argument to the FYL2X instruction.

@Mysticial's comment is a spot on:
The algorithm used by FYL2X is probably using, after all the other necessary steps, an approximation formula for log(x + 1).
To transform this into a formula for log(x) the input must be subtracted by one. An x - 1 operation will lose precision if x is very small (because the big difference in the exponents of the two numbers will shift most of x's digits off to the right).
FYL2XP1 doesn't do x - 1 and hence won't lose precision.

edited Jun 05 '22 at 12:51

Sep Roland

33,889
7
43
76

answered Oct 10 '18 at 13:01

Margaret Bloom

41,768
5
78
124

2

The Taylor series for `log(x)` is usually done about `x = 1`. So every term will have `x - 1`. If you're trying to compute `log(x + 1)` for a very small `x`, a direct call as `log(x + 1)` will result in `x + 1 - 1` which will drop all the low-order bits of `x` - thereby losing precision if `x` is really small. A built-in `log(x+1)` can then elide this `x + 1 - 1` step and preserve the full precision. – Mysticial Oct 10 '18 at 18:51
@Mysticial Yes, good point! There are quite a few hints in the manuals to believe the instructions are using an approximation around x=1, even though the actual algorithm is not revealed. But your phrasing is excellent because it focuses on the real problem here: approximations done about x=1 will have a x-1 term and 1 >> x means a loss in precision. Thank you very much! P.S. Do you happen to know why Intel chose that specific input range for `FYL2XP1`? – Margaret Bloom Oct 10 '18 at 19:07
I don't know for sure, but sqrt(2)/2 is 1/sqrt(2) which ensures a converge rate of 0.5 bits/term or better for the series. Maybe the tables are only hard-coded to 106 or 128 terms? The problem is that if the instruction is supposed to give a full 64-bits of precision for 80-bit FP, it's going to have multiple ulps of error unless it works with an even higher internal precision - regardless of the convergence rate. – Mysticial Oct 10 '18 at 19:20
@Mysticial Thanks but I'm afraid I don't follow you. AFAIK the convergence rate is a property of the series not of the input domain. Where does the 0.5 bits/term figure come from? – Margaret Bloom Oct 10 '18 at 19:50
4

@Mysticial IIRC, the original 8087 carried an additional three bits for internal computations, i.e. 67 mantissa bits in all. Check John F. Palmer, Stephen P. Morse, "The 8087 Primer", Wiley 1984 if you need to know for sure. Error in FYL2XP1 < 1 ulp if y=1, I think. For the x87 FPU inside the AMD Athlon Processor, we used 68-bit internal precision. See: Stuart F. Oberman, "Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7™ Microprocessor", In *Proceedings 14th IEEE Symposium on Computer Arithmetic*, pp. 106-115. IEEE 1999. Error in FYL2XP1 < 1 ulp for any x, y – njuffa Oct 10 '18 at 19:59
@MargaretBloom I was assuming the standard Taylor Series for log(x - 1). That one will converge at 0.5 bits/term with the `1 +/- 1/sqrt(2)` boundaries. Of course we don't know for sure. – Mysticial Oct 10 '18 at 20:48
@njuffa Woah. That's good to know. Didn't realize it! Thanks. – Mysticial Oct 10 '18 at 20:48
@Mysticial Eh, that's what I don't follow: how from the Taylor series for log(x + 1) within that range one arrives at 0.5 bits/term. The term i-th approximately corrects the series by 2^-(i+1). If, when you have time, you could explain it I'd be grateful :) It's just curiosity – Margaret Bloom Oct 10 '18 at 21:29
2

The series is `log(x+1) = x - x^2/2 + x^3/3 - x^4/4...` The denominator is linear so it's negligible. The `x = 1/sqrt(2)` will drop by a factor of 2 every two terms. (since `x^2 = 1/2`) Thus 0.5 bits/term. Likewise for a given `x`, the convergence rate is `-log(|x|)/log(2)` bits/term where `|x| < 1` over the complex plane. Outside those bounds it diverges. – Mysticial Oct 10 '18 at 21:59
2

@Mysticial Taylor expansions are rarely used to compute transcendentals in fixed-sized FP arithmetic. Rafi Nave, "Implementation of Transcendental Functions on a Numerics Processor", *Microprocessing and Microprogramming* 11 (1983) 221-225 states that the 8087 used a combination of CORDIC and rational approximations (presumably of the minimax kind) to compute transcendental functions. For the Athlon, for log2(1+x) on the primary approximation interval I made use of the identity log(1+x) = 2*atanh(x/(x+2)), and used a rational minimax approximation to compute atanh(). Very few terms are needed. – njuffa Oct 10 '18 at 22:07
@njuffa Good to know. I figured CORDIC would come into play at some point. Admittedly, most of my experience is on the arbitrary precision side. – Mysticial Oct 10 '18 at 22:10
@Mysticial Thank you very much, that makes perfect sense :) I figured you meant how fast the error was halving (the convergence rate is how fast the error goes to zero) due to the use of the bits/term unit but I kept the denominator and got tricked by it since it makes the series half at about 1 bit/term for the first 10 terms. – Margaret Bloom Oct 10 '18 at 23:15

score 2 · Answer 2 · edited Jun 05 '22 at 12:39

2

As far as I understand, the common description of this instruction contains an inaccuracy. Historically, this instruction did not work in the range ±(1 – sqrt(2) ⁄ 2), but rather in the range sqrt(2) ⁄ 2 – 1 <= x <= sqrt(2) – 1. This is a convenient range of approximation of this function using a series expansion (e.g. by Chebyshev method). This seems to be true because then the left and right boundaries of the range for the argument of the logarithm (x + 1) differ by a factor of 2, which makes it relatively easy to extend the result of this function to any x > -1.
It is important to note that in all modern processors, the fyl2xp1 instruction supports any x > -1, and this feature can be safely used. But you need to remember that this instruction does not throw an exception in case of an erroneous argument.

edited Jun 05 '22 at 12:39

Sep Roland

33,889
7
43
76

answered Jun 03 '22 at 04:49

aenigma

111
4

While plausible, in Intel's processor documentation for the 8087 / 80287 the argument restrictions on `FYL2XP1` are clearly stated as 0 ≤ |ST(0)| < (1 - (√ 2 /2)). Consistent with this, the publication by the lead engineer: R. Nave, "Implementation of Transcendental Functions on a Numerics Processor", *Microprocessing and Microprogramming*, Vol. 11, No. 3-4, March-April 1983, pp. 221-225, states the restriction as (√ 2 /2)-1 < X < (1 - (√ 2 /2)). I can confirm from personal implementation experience that the permissible range for X was extended to all X > -1 in later x87 implementations. – njuffa Jun 03 '22 at 07:48
J.F. Palmer and S.P. Morse, "The 8087 Primer", Wiley 1984, likewise states the argument restriction on `FYL2XP1` as |x| < 1- √ (1/2). – njuffa Jun 03 '22 at 08:02
Unfortunately, the correctness of the range given in these sources is almost impossible to verify - a working system with an 8087 coprocessor is needed. – aenigma Jun 04 '22 at 14:03

Instruction FYL2XP1

2 Answers2

Linked