Branch prediction and performance

Question

I'm reading a book about computer architecture and I'm on this chapter talking about branch prediction. There is this little exercise that I'm having a hard time wrapping my head around it.

Consider the following inner for loop

for (j = 0; j < 2; j++)
{
    for (i = 10; i > 0; i = i-1)
        x[i] = x[i] + s
}

-------> Inner loop:

L.D       F0, 0(R1)
ADD.D     F4, F0, F2
S.D       F4, 0(R1)
DADDUI    R1, R1, -8
BNE       R1, R3, Loop

Assume register F2 holds the scalar s, R1 holds the address of x[10], and R3 is pre-computed to end the loop when i == 0;

a) How would a predictor that alternates between taken/not taken perform?

---- Since the loop is only executed 2 times, I think that the alternate prediction would harm the performance in this case (?) with 1 miss prediction.

b) Would a 1-bit branch prediction buffer improve performance (compare to a)? Assume the first prediction is "not taken", and no other branches map to this entry.

---- Assuming the first prediction is "not taken", and 1-bit predictor invert the bit if the prediction is wrong. So it will be NT/T/T. Does that make it have the same performance as problem a) ? with 1 miss prediction.

c) Would a 2-bit branch prediction buffer improve performance (compare to a)? Assume the first prediction is "not taken", and no other branches map to this entry.

---- 2-bit branch prediction starting with "not taken". As I remember 2 bit prediction change after it misses twice. So this prediction will go like NT/NT/T/T. Therefore its performance will be worse compare to a). 1 miss prediction

That was my attempt to answer the problems. Can anyone explain to me if my answer is right/wrong in more detail please? Thanks.

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

0

Since the loop is only executed 2 times

You mean the outer-loop conditional, the one you didn't show asm for? I'm only answering part of the question for now, in case this confusion was your main issue. Leave a comment if this wasn't what had you confused.

The conditional branch at the bottom of the inner loop is executed 20 times, with this patter: 9xT, 1xNT, 9xT, 1xNT. An alternating predictor there would be wrong about 50% of the time, +/- 20% depending on whether it started right or wrong.

It's the outer loop that only runs twice: T,NT. The whole inner loop runs twice.

The outer loop branch would either be predicted perfectly or terribly, depending on whether the alternating prediction started with T or with NT.

edited Jun 20 '20 at 09:12

Community

1
1

answered May 18 '16 at 09:57

Peter Cordes

328,167
45
605
847

Oh I had that inner loop understanding wrong. I see now. Is my understanding of 1-bit branch prediction and 2-bit prediction correct for B) and C)? (ignoring the wrong loop) – Nguyen Tran May 18 '16 at 11:56
And for question a). Assuming it starts out as NT, but that branch is suppose to be taken, does the system go back and taken that loop afterward? Does it means that it would take twice as long to run the inner loop branch? – Nguyen Tran May 18 '16 at 12:01
@NguyenTran: The CPU always "appears" to run instructions in the correct program order. As soon as a mispredict is detected, all the incorrectly-speculated work is thrown away, and it starts executing from the correct side of the branch. A mispredict usually costs many times more cycles than a correctly-predicted branch. (e.g. 15 cycles vs. 1 cycle for a long pipeline). – Peter Cordes May 19 '16 at 01:08
re: B and C: can you edit the question to make sure it's clear that you're asking the right question, now that you've realized your misunderstanding of the inner loop? I don't want to have to guess what you actually think now that you understand the inner loop. – Peter Cordes May 19 '16 at 01:10

Branch prediction and performance

1 Answers1