Why "execute" located before "memory" in Instruction Set Achitecture?

Question

I have learnd Processor Architecture 3 years ago.

Until today , I can't figure out why execute located before memory in the sequential instructions.

While executing the instruction [ mov (%eax) %ebx] , does it needn't to access memory?

Thanks!

just take this as an example.The processor may "Fetch" the instruction, and then decode it, and then execute, and then "memory". My problem is shouldn't memory run before execute, as (%eax) will access memory. — wenwenhao, Aug 21 '12 at 01:23

osgx · Accepted Answer · 2014-02-21T12:33:34.957

Let's remember classic RISC pipeline, which is usually studied: http://en.wikipedia.org/wiki/Classic_RISC_pipeline. Here are its stages:

IF = Instruction Fetch
ID = Instruction Decode
EX = Execute
MEM = Memory access
WB = Register write back

In RISC you can only have loads and stores to work with memory. And EX stage for memory access instruction will compute the address in memory (take address from register file, scale it or add offset). Then address will be passed to MEM stage.

Your example, mov (%eax), %ebx is actually a load from memory without any additional computation and it can be represented even in RISC pipeline:

IF - get the instruction from instruction memory
ID - decode instruction, pass "eax" register to ALU as operand; remember "ebx" as output for WB (in control unit);
EX - compute "eax+0" in ALU and pass result to next stage MEM (as address in memory)
MEM - take address from EX stage (from ALU), go to memory and take value (this stage can take several ticks to reach memory with blocking of the pipeline). Pass value to WB
WB - take value from MEM and pass it back to register file. Control unit should set the register file into mode: "Writing"+"EBX selected"

Situation is more complex in true CISC instruction, e.g. add (%eax), %ebx (load word T from [%eax] memory, then store T+%ebx to %ebx). This instruction needs both address computation and addition in ALU. This can't be easily represented in simplest RISC (MIPS) pipelines.

First x86 cpu (8086) was not pipelined, it executed only single instruction at any moment. But since 80386 there is pipeline with 6 stages, which is more complex than in RISC. There is presentation about its pipeline, comparing it with MIPS: http://www.academic.marist.edu/~jzbv/architecture/Projects/projects2004/INTEL%20X86%20PIPELINING.ppt

Slide 17 says:

Intel combines the mem and EX stages to avoid loads and stalls, but does create stalls for address computation
All stages in mips takes one cycle, where as Intel may take more than one for certain stages. This creates asymmetric performance

In my example, add will be executed in that combined "MEM+EX" stage for several CPU ticks, generating many stalls.

Modern x86 CPUs have very long pipeline (16 stages is typical), and they are RISC-like cpus internally. Decoder stages (3 stage or more) will break most complex x86 instructions into series of internal RISC-like micro-operations (sometimes up to 450 microoperations per instruction are generated with help of microcode; more typical is 2-3 microoperations). For complex ALU/MEM operations, there will be microop for address computation, then microop for memory load and then microop for ALU action. Microoperations will have depends between them, and planned to different execution ports.

Why "execute" located before "memory" in Instruction Set Achitecture?

1 Answers1