What does it really mean to "Squash" an instruction?

Question

Studying pipelined processors, and they make mention of predicting a branch being taken or not taken, inserting salient instructions in the sort of "interim" before we decide if the branch is taken or not, but then "squashing" them if we guess wrong.

How do we squash them? Do we just not write back? How many instructions can we insert in the interim between figuring out it's a branch and figure out if taken or not taken. I guess just one, right - because we figure out IT IS a branch in ID, and then figure out if taken or not in EX? So just one squashable instr.?

IF->ID->EX->MEM->WB

Leeor · Accepted Answer · 2014-10-26T18:35:29.283

The instructions that are in the shadow of a mispredicted branch are bogus - incorrect, and must never be seen by the user (pipelining is a little bit of cheating, to the outer world we pretend that the CPU performs one instruction at a time, in perfect sequential order). Therefore, these speculative instructions should be cleared without having any architectural effect such as writing to the memory or changing the value of a register.

Since the execution is done in-order, and the final branch resolution is known at EX, you should have plenty of time to tell the following instruction in the pipeline to cancel themselves (effectively replacing them with bubbles, where the machine does nothing for them on any of the following pipe stages). The stages where you make an impact on the machines are further down the pipe, so none of the bad instructions would reach them too early. This will get much more complicated on an out-of-order machine but that's a different story. The only wasted work resides in the preceding stages, of which you have here 2 (IF + ID), and possibly (depending on you design) the next instruction about to be fetched to the IF in the next cycle (in case you don't have time to correct your program counter following the branch. On more complicated machines you may have a deeper pipeline with more stages, so the penalty grows.

The act of clearing the bad instructions that follow a mispredicted branch is usually called flushing, clearing or squashing (note that these terms may also have different meanings in computer architecture, so it's not a technical term as much as it is a graphic description)

score 1 · Answer 2 · edited May 23 '17 at 12:07

I'll add to Leeor answer by saying that branch prediction is more influential in the case of heavily pipelined superscalar processors. Although dynamism (coming from OoO) and speculation (coming from branch prediction) are two separate concepts their interplay is complex. Moreover, since they usually both appear together on modern architectures they are sometime confused.

In the simple 5 stage pipeline that you consider, it takes one clock cycle between ID and EX to determine if a branch was taken or not. On more complex architectures the penalty for a mis-prediction is in the order of tens of clock cycles which potentially corresponds to even more fetched instructions if the architecture is superscalar.

When considering such a big set of fetched instructions even squashing gets complicated, e.g. what happens if there is a store of a value which we are not sure is correct? What if we encounter another branch etc.

Moreover a misprediction correspond to a big hit in terms of performance therefore more advanced techniques like Selective Replay and Value Prediction are used to ameliorate this penalty.

What does it really mean to "Squash" an instruction?

2 Answers2