How to implement CISC Pipelined CPU right?

Question

I'm working on the project for my graduation in college. I already did one pipelined CPU and simulated it in Logisim program for one of my courses, but now I need to flash CPU to my FPGA and write game for this CPU also. So I thought to improve my architecture and also I altered some functions, because didn't really knew how it should work. The problem is with Memory accessing, on the first picture you can see my first CPU. Logic was: Read opcode->decode and prepare operands->Read memory if needed->Check if JMP(CJE,CJNE...)->Arithmetic and logic calculations->Write answer to register or to memory (I delayed here for 1 clock all pipeline if needed for not to collide with read memory operations - Memory Hazard avoiding).

Pipe1

CPU Diagram

Now I want to insert some graphics for my game and added one more segment that will cooperate with PPU(Picture Processing Unit).

Pipe2

Also, I saw a lot of examples in the Internet that put Memory segment after Execution segment and that is not understandable for me. How I can implement opcode like ADD A, MEM if I need to read the MEM variable form Memory and then to add it to A? Or I'm missing something? Can you, please, help me with that?

Pipeline is **much** easier to implement with RISC isa where all memory accesses are either load or store. Instructions like `ADD A, MEM` are forbidden in this model. They must be split in two instructions (or uops). — Alain Merigot, Apr 01 '19 at 12:01
@AlainMerigot: or you implement it like 486 / Pentium, and memory-source instructions take multiple cycles. (e.g. https://agner.org/optimize/ has P5 Pentium instruction timings. `add reg, [mem]` takes 2 cycles, and is pairable in either pipe. I don't think splitting internally into uops is an accurate description of how P5 pentium handles memory-source or memory-dest instructions, that came later in P6 to enable out-of-order exec). But when designing an ISA from scratch, it would be nearly pointless to make it CISC but then make only the RISC subset of it pipeline fully efficiently. — Peter Cordes, Apr 01 '19 at 12:20
But yes, it's obviously much harder and more complex. You might put a load stage before exec, and a store stage after? But then you have to detect memory dependency hazards for store-forwarding or stalling. In-order Atom does do LEA on its AGUs, in a stage before normal exec, so probably it has actual memory access there, too. Maybe you could have that pre-exec memory access also write into the store buffer for store or RMW instructions, so later loads will see that there's a store or pending store. Or just don't let the next instruction start for 3 cycles after a RMW, like P5 Pentium! — Peter Cordes, Apr 01 '19 at 12:25
@PeterCordes thank you for your answer. I think I will save the memory reading before execution and in WB stage will make Memory hazard unit for checking collisions. It will not stall pipeline every time I need to write to memory. but only if there is another reading operations happens. — Stanislav Ryzhkov, Apr 01 '19 at 18:45

How to implement CISC Pipelined CPU right?

0 Answers0