I am going through the textbook Computer Organization and Design and I am a bit confused with the Branch Prediction and how it works with a 5 stage pipeline scenario - IF ID EX MEM WB.
Consider the following sequence of instructions:
TOP: SUB X2, X2, X3
.
.
B.NE TOP
ADD X1, X1, X2
Assume the first case with no branch prediction and all possible forwarding paths. As per the textbook, when the Branch to the TOP is taken, the processor would incur a penalty of 1 stall. This is because after the B.NE instruction, the next instruction in the pipeline would be the ADD instruction when it should really have been the SUB instruction. The processor realizes that it inserted the incorrect instruction into the pipeline only at the end of the ID stage of the B.NE instruction and hence has to nop all the remaining stages of ADD (by the end of the ID stage it also manages to calculate the correct address to fetch the instruction from). So the pipeline in this case looks something like this:
However if the branch was not taken, there would have been no stalls. Because the next instruction would correctly have been the ADD instruction and the execution would have proceeded normally.
Now consider the same instructions and the same processor but with perfect branch prediction. Assume Branch is taken. The processor would know that the instruction is a Branch instruction only during the ID stage for the B.NE instruction. And the Branch Prediction would kick in only after that. By that time, the ADD instruction is already in the pipeline. Hence there would still be a penalty of 1 stall. So what is the advantage of even having the Branch Prediction? I am clearly missing something.
So I think I am confused with where exactly in the pipeline does the Branch Prediction kick in?