Why does the 80x87 instruction set use a "stack-based" design?

Question

Back when Intel first designed the 8087, why did they choose to organize the floating-point registers as a stack? What possible advantage could be gained from such a design? It seems much less flexible and harder to work with than allowing arbitrary registers to be used as source and destination operands.

I suggest you ask this on the [Electrical Engineering](http://electronics.stackexchange.com/) stack, but I believe it would boil down to the limited transistor count. According to [Wikipedia](http://en.wikipedia.org/wiki/Transistor_count) the 8080 had 4500 transistors. By the time of the 8088 it was up to 29000 transistors, but I don't know if it's directly relevant to your question. — Elliott Frisch, Oct 18 '14 at 20:36
@ElliottFrisch, this is not a question about IC design, but about instruction set design. However, if the answer is "because it was easier to implement in silicon", please go ahead and leave that as an answer. I don't see why that would be the case, though. — Alex D, Oct 18 '14 at 20:42
See also [this document](http://www.cims.nyu.edu/~dbindel/class/cs279/87stack.pdf) (found through [wikipedia](http://en.wikipedia.org/wiki/X87)). — Jester, Oct 18 '14 at 23:31
You're asking us to look back in history (without the knowledge gained in technology since) to determine why a decision was made. Hindsight is 20/20, but there's no way to foretell the future unless you're a psychic (if in fact they exist). — Ken White, Oct 19 '14 at 05:34
@KenWhite, thank you for your feedback. This question is not asking others to be psychics or to read the designers' minds. The idea was: design choices have pros/cons, and an experienced assembly programmer (which I am not, but was hoping to find) could be expected to understand those pros/cons. For example, if you ask me about design choices made in the Ruby language, I can usually provide a succinct explanation, including subtle points which you would probably not figure out unless you have used the language for several years. — Alex D, Oct 19 '14 at 05:47
@Alex: However, you're asking about decisions made back in the late 1980s/early 1990s, when processor architecture was totally different than today. In 1990, could you have known that this site would exist so you could ask this question here? I can provide a succinct answer to that question: No, I would not have known. You're asking for a discussion of why a decision was made at a point in time that is long past, and discussion questions are not appropriate here. — Ken White, Oct 19 '14 at 05:54
@KenWhite, I respectfully disagree. I wasn't looking for discussion, but for a clear, solid answer, which has been provided (see below). This isn't a question about a dead, irrelevant architecture, but one which is still alive today. My computer's CPU can execute 8087 instructions, and the web browser I am using to post this comment *uses* 8087 instructions (I checked). — Alex D, Oct 19 '14 at 07:04
Yes, it's still alive today because of the decisions made back then, when we didn't have the knowledge we have now. The "architecture that is still alive today" clearly isn't "dead and irrelevant". Your computer can clearly execute the 8087 instructions based on the decisions made a couple of decades ago, which is a tribute to the decisions made at that time. The clear, solid answer: It was the solution available at the time given the technology that existed at that time. A discussion now of why it was appropriate at that time is a history review. — Ken White, Oct 19 '14 at 07:13

score 7 · Accepted Answer · edited Apr 10 '20 at 09:16

7

The article "On the Advantages of the 8087's Stack", shared in the comments by @Jester, explains the thinking of the designers. A summary of why they organized the floating-point registers as a stack:

Potentially, it could have made procedure calls more efficient, since (in theory) neither callers nor callees would have to explicitly save and restore FP registers. Callees who needed to do FP calculations would simply push their operands on the register stack, do their calculations, and pop the results off the stack when they were done, automatically restoring the caller's x87 state. (This is essentially the same as how the machine stack is used for function parameters, return values and local variables.)
Given the way instructions were already encoded on the 8086/8088, and the number of opcodes already in use, they could only provide 1-operand instructions for the 8087, not 2-operand. That would not have worked well with a flat register file.
They thought that providing the FXCH instruction would make it simple enough to rearrange the x87 register stack at will (so that arbitrary pairs of values could be used as operands when needed). Further, the FXCH operation is cheap.

edited Apr 10 '20 at 09:16

Elethom

5
2

answered Oct 19 '14 at 05:32

Alex D

29,755
7
80
126

1

Am I the only person who thinks that the 80x87 was a good architecture, but was the victim of some rather crummy compilers? If a compiler doesn't allow the declaration of extended-precision variables, and sometimes leaves 80-bit values in registers but sometimes "spills" them as 32 or 64 bits, the resulting semantics are going to be crummy, but the 8x87 designers aren't to blame. I feel sad when people curse the 8x87 for the fact that the product of a two floats often but not always behaves as Extended, rather than cursing the language which doesn't let them consistently use Extended. – supercat Oct 19 '14 at 18:36
@supercat, very interesting, I didn't know that such a problem existed. I don't think the C standard specifies how many bits `float`s and `double`s should be stored in, or even that they must use IEEE format, so it seems that a compliant C compiler *could* use 80-bit `double`s. Correct me if I'm wrong. – Alex D Oct 19 '14 at 19:48
For whatever reason, many C compilers for x86 used 64 bits for both `double` and `long double` types and failed to provide any 80-bit data type; this meant that if e.g. one wanted to compute `x=sin(a)+b; y=sin(a)+c;` one either had to compute sin(a) twice or else lose precision when storing it to a temporary variable. To make things worse, even if the `sin(a)` operation appeared twice as shown above, some optimizing compilers would save the intermediate computation as a 64-bit value, use the 80-bit value in computing x, and use the 64-bit value in computing y. – supercat Oct 19 '14 at 20:04
1

Thus, if a=1.1; b=0.875; c=0.875; the above computation might compute `x` using a value for sin(x) of 0.8912073600614353**3995** while the computation of `y` assumed a value of (0.8912073600614353**0527**). Subtracting 0.875 from those values would yield different `double` values. Thus, even though `b` and `c` were equal, `x` and `y` would end up unequal and programmers would curse the fact that `x` was computed with "extra" precision. – supercat Oct 19 '14 at 20:06
2

Of course, if the language had implemented `long double` to be 80 bits, and allowed the programmer to write `t=sin(a); x=t+b; y=t+c;` then the results would have been consistent. The question of whether `sin(a)` should return a value rounded to 64-bit precision might have been a bit nebulous, but copying the value--whatever it was--to `t` would have ensured that `x` and `y` would both be evaluated using the same value as would have been used with the original expressions. – supercat Oct 19 '14 at 20:12
@supercat: you're not the only one who thinks that x87's extra precision is a good feature on its own. Totally agreed with your analysis that if used carefully, extra temporary precision often helps. But that trying to oversimplify floating point for portable languages resulted in a mess, especially with optimizing compilers that don't always round even when the language says they must (e.g. gcc) because x87 doesn't have a cheap way to do that without store/reload. Deterministic FP is hard. – Peter Cordes Apr 10 '20 at 19:25
(The stack design is still a downside, though, for modern CPUs. Clearly made sense at the time, but it didn't age well. Extra precision is orthogonal to having a register stack so we could have had extra precision in a flat register file, and a fast(?) instruction that rounded. Hmm, unless you truncate it might have to round away from 0 and roll over into the exponent, so it might not be as fast as you'd like, but could still have been way faster than `gcc -ffloat-store`, without having to disable the extra precision always.) – Peter Cordes Apr 10 '20 at 19:28
1

@PeterCordes: There were some missteps in the design of the 8087, but what I find perhaps most ironic is that people think of 80-bit `long double` as being designed around the 8087, when it actually offers even bigger benefits on processors *without* floating-point units. IMHO, C should have defined a `long float` type, whose precision and range could be anywhere between that of `float` and `long double`, and then specified that computations on `float` promote to `long float`, and `double` to `long double`. Combine that with a syntax to specify a list of acceptable types for... – supercat Apr 10 '20 at 20:34
...variadic argument prototypes, and floating-point could have been made to work much more consistently. – supercat Apr 10 '20 at 20:34

Why does the 80x87 instruction set use a "stack-based" design?

1 Answers1