2

The 8086 is using 16-bit instruction but the RAM addresses only hold 8-bit how does the CPU load programms from the RAM then ? Does it load one address and then checks if the instruction needs 1/2/3 bytes (e.g. moving a immediate to a register 8/16 bit) and then executes the operation or am I getting it wrong that one RAM 'space' is 16-bit big ?

  • The RAM is actually 24 bits. This is why it's paragraph aligned with a segment register plus offset. When you specify an eight bit address, it assumes the segment register is the base. – David Hoelzer Aug 24 '16 at 11:13

1 Answers1

6

Many instructions are multi-byte, and yes that means they span two or more addresses.

8086's memory bus is 16-bit, so it can load 16 bits (two adjacent addresses) in a single operation. You're confusing byte-addressable memory with the bus width.

Does it load one address and then checks if the instruction needs 1/2/3 bytes (e.g. moving a immediate to a register 8/16 bit)

It continually fetches instruction bytes into a 6-byte prefetch buffer (2 bytes at a time, because it's a 16-bit CPU with 16-bit busses), when the bus isn't busy with data accesses triggered by the instruction that's running.

The buffer is large enough to hold the largest allowed 8086 instruction1 (excluding prefixes, which are handled one per clock cycle before the CPU gets to the opcode). When it's done executing the previous instruction, it looks at the buffer. See the link below for a better description, but it probably tries to decode the buffer as a whole instruction, or at least find an opcode, otherwise waits for the next fetch to try again. (I'm not sure how much it can pipeline fetching of later bytes for longer instructions; if it can start executing while that happens.)

Note 1: But 8088, with its 8-bit bus, shrinks the prefetch buffer to 4 bytes, see this retrocomputing Q&A. But apparently 8088 has the same transistor layout except for the Bus Interface Unit (BIU). So it, and therefore 8086, must not depend on being able to hold a whole instruction in the prefetch buffer, because 8088 can execute mov word [0x1234], 0x5678 (6 bytes: opcode + modrm + disp16 + imm16). But the opcode + modrm is only 2 bytes, with more bytes for a disp8 or disp16 in the addressing mode, and/or imm8 or imm16 immediate, so presumably those can get fetched / decoded later.

This 8086 gate-level reverse-engineering article, Latches inside: Reverse-engineering the Intel 8086's instruction register, says the 8086's actual instruction register is 1 byte, holding the opcode of the currently-executing instruction. (It wasn't until later CPUs that any 0F xx 2-byte opcodes were introduced).


See also: 8086 CPU architecture, which was the first hit for "8086 code fetch". It confirms that fetch and execute do overlap, so it's pipelined in the most basic way.

TL:DR: It fetches into a buffer until it has a whole instruction to decode. Then it shifts any extra bytes to the front of the buffer, because they're part of the next instruction.

I've read that usually instruction-fetch is the bottleneck for 8086, so optimizing for code-size outweighed pretty much everything else.


A pipelined CPU wouldn't have to wait for execution of the previous instruction to finish to get started on decoding. Modern CPUs also have much higher bandwidth code-fetch, so they have a queue of decoded instructions ready to go (except when branches mess this up.) See http://agner.org/optimize/, and other links in the tag wiki.


Also, some very common instructions are a single byte, like push r16.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 2
    I think we can get into the weeds of the decoding in my old favorite [**The Art of Assembly Sect 3.3.7**](https://courses.engr.illinois.edu/ece390/books/artofasm/CH03/CH03-3.html#HEADING3-102). It's an overview of the decoding. – David C. Rankin Aug 24 '16 at 00:09
  • 3
    The prefetch queue on the 8086 was only 6 bytes, while there was no limit to the length of instructions. You could use as many redundant prefixes as you wanted, the 15 byte limit was added with the '386. Without redundant prefixes an instruction could exceed 6 bytes. eg `mov [es:0],1234` is 7 bytes long. However, without prefixes at all I don't think an 8086 instruction can exceed 6 bytes. My guess is that prefix bytes were decoded separately and individually like they were their own instructions. – Ross Ridge Aug 24 '16 at 01:24
  • @RossRidge: Thanks. Sounds like a reasonable guess. If you or anyone else wants to edit this answer with further corrections, feel free. BTW, I think you're right about 5 or 6 bytes being the max without any prefixes; that sounds familiar. – Peter Cordes Aug 24 '16 at 01:30
  • You're confusing byte-addressable memory with the bus width. plus one – old_timer Aug 24 '16 at 05:07