A while back, I was reading up on some Android performance tips when I came by:
Foo[] mArray = ...
public void zero() {
int sum = 0;
for (int i = 0; i < mArray.length; ++i) {
sum += mArray[i].mSplat;
}
}
public void one() {
int sum = 0;
Foo[] localArray = mArray;
int len = localArray.length;
for (int i = 0; i < len; ++i) {
sum += localArray[i].mSplat;
}
}
Google says:
zero()
is slowest, because the JIT can't yet optimize away the cost of getting the array length once for every iteration through the loop.
one()
is faster. It pulls everything out into local variables, avoiding the lookups. Only the array length offers a performance benefit.
Which made total sense. But after thinking way too much about my computer architecture exam I remembered Branch Predictors:
a branch predictor is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure. The purpose of the branch predictor is to improve the flow in the instruction pipeline.
Isn't the computer assuming i < mArray.length
is true
and thus, computing the loop condition and the body of the loop in parallel (and only predicting the wrong branch on the last loop) , effectively removing any performance loses?
I was also thinking about Speculative Execution:
Speculative execution is an optimization technique where a computer system performs some task that may not be actually needed... The objective is to provide more concurrency...
In this case, the computer would be executing the code both as if the loop had finished and as if it was still going concurrently, once again, effectively nullifying any computational costs associated with the condition (since the computer's already performing computations for the future while it computes the condition)?
Essentially what I'm trying to get at is the fact that, even if the condition in zero()
takes a little longer to compute than one()
, the computer is usually going to compute the correct branch of code while it's waiting to retrieve the answer to the conditional statement anyway, so the performance loss in the lookup to myAray.length
shouldn't matter (that's what I thought anyway).
Is there something I'm not realizing here?
Sorry about the length of the question.
Thanks in advance.