Why is i 2 slower than (i + 1) 2 in V8

Question

Consider the following snippets and results from running:

Snippet 1:

let final_result, final_result2;
let start = new Date();
for(let i = 0; i < 100000000; i++) {
    final_result = Math.pow(i + 1, 2);
}
let end = new Date();
console.log(end - start); // Output 1

let start2 = new Date();
for(let i = 0; i < 100000000; i++) {
    final_result2 = (i + 1) ** 2;
}
let end2 = new Date();
console.log(end2 - start2); // Output 2

Snippet 2:

let final_result, final_result2;
let start = new Date();
for(let i = 0; i < 100000000; i++) {
    final_result = Math.pow(i, 2);
}
let end = new Date();
console.log(end - start); // Output 1

let start2 = new Date();
for(let i = 0; i < 100000000; i++) {
    final_result2 = i ** 2;
}
let end2 = new Date();
console.log(end2 - start2); // Output 2

Snippet 3:

let final_result, final_result2;

function t1(){
    for(let i = 0; i < 100000000; i++) {
        final_result = Math.pow(i, 2);
    }
}

function t2(){
    for(let i = 0; i < 100000000; i++) {
        final_result2 = i ** 2;
    }
}

let start = new Date();
t1();
let end = new Date();
console.log(end - start); // Output 1

let start2 = new Date();
t2();
let end2 = new Date();
console.log(end2 - start2); // Output 2

Results:

Output	Firefox 88 (ms)	Edge 90 (ms)
Snippet 1 - Output 1	63	467
Snippet 1 - Output 2	63	487
Snippet 2 - Output 1	63	468
Snippet 2 - Output 2	63	1180
Snippet 3 - Output 1	64	480
Snippet 3 - Output 2	64	1200

These results were obtained consistently over numerous tests and the number being added did not affect performance, i.e. other similar operations ((i * 1) ** 2, (i + i) ** 2, etc.) all resulted in a speed up over just using i ** 2. Meanwhile Math.pow is consistent in its speed.

How can repeated calculations of (i + n) ** 2 be faster than i ** 2 when the latter has less to calculate when using a V8 browser (Edge and Chrome both had similar results), meanwhile Firefox's runtime was consistent between the 2 snippets.

What happens if you put these tests into seperate functions? I'm not exactly sure how engines optimize hot loops. — Jonas Wilms, May 03 '21 at 16:40
@JonasWilms Negligibly slower in both FF and Edge, same results though, updated — Nick is tired, May 03 '21 at 16:46
Just realised my edit using separate functions doesn't include the `(i+1)**2` test, but rest assured, the results are the same in both cases (as in, the same issue presents itself) — Nick is tired, May 03 '21 at 16:52
`function t2_mod(){ for(let i = 0; i < 1000; i++) { for(let k = 0; k < 100000; k++) { final_result = k ** 2; } } }` For smaller i, performance is more or less the same — Jonas Wilms, May 03 '21 at 17:18
After some tests, it seems this only happens for numbers larger 10000000 and then consistently stays twice as slow. I'd speculate that for some reason, V8 optimizes the second case to use integer multiplication or something like that, and then when numbers reach MAX_SAFE_INTEGER it as to fall back to a slower version. Pure speculation though, I don't have time to dig through V8's various compilers. Hopefully jmrk passes by :) — Jonas Wilms, May 03 '21 at 17:26
@JonasWilms The initial set of code that I was testing with did actually use 10,000,000 (and had the same problem) rather than the 100,000,000 used in the examples above, but FF's results were as low as 8ms in that case which is.. short enough that I thought that other things could start having an effect — Nick is tired, May 03 '21 at 17:39
Maybe there's one 0 too much, but for smaller k (and same number of iterations) performance was equal for me on NodeJS 12 — Jonas Wilms, May 03 '21 at 17:42
@JonasWilms So in Edge I could test as low as 1,000,000 ([screenshot for `i**2`](https://i.stack.imgur.com/E4hQU.png), [screenshot for `(i+1)**2`](https://i.stack.imgur.com/oxA49.png)), below that I was getting 0-1ms results in both cases. Looks like similar results to me, but obviously less pronounced — Nick is tired, May 03 '21 at 17:47
Add an outer loop to keep the number of iterations constant, otherwise the test doesn't make sense. — Jonas Wilms, May 03 '21 at 17:53
@JonasWilms Right, gotcha, yes, I see what you're saying. My first thought when I saw the issue was some background type coercion or something, but guess will see when someone who knows V8 better than me shows up :) — Nick is tired, May 03 '21 at 17:54
@JonasWilms: good intuition, but it's not `MAX_SAFE_INTEGER` that's relevant for performance/optimizations, it's 32-bit or 31-bit integer range (depending on which optimization/scenario we're talking about). — jmrk, May 04 '21 at 00:38

score 7 · Accepted Answer · answered May 04 '21 at 00:35

How can repeated calculations of (i + n) ** 2 be faster than i ** 2 when the latter has less to calculate?

That's because this microbenchmark is not measuring exponentiation time. Beware of misleading microbenchmarks!

Instead, what it is measuring is:

HeapNumber allocations and related operations (write barriers, garbage collection),
in the slower cases, function call overhead (as opposed to inlining) and some of the checks dictated by the JS spec (that didn't get optimized away).

One of the fundamental architectural differences between Spidermonkey and V8 is that the former uses "NaN-boxing" whereas the latter uses "pointer-tagging". Both have pros and cons; in this particular case the consequence is that V8 needs to allocate a fresh "HeapNumber" for every result that you write to final_result, whereas Firefox can just write the raw IEEE double there. (This is pretty much the worst-case comparison for the pointer-tagging approach.) That explains the speed difference between the two engines. This is easy to verify by modifying the test such that it stores the results into an array (i.e. let final_result = []; and final_result[0] = ...) -- in which case V8's "array elements kind" tracking kicks in and it stores raw doubles as well.

The slower cases using ** instead of Math.pow appear to be untapped optimization potential in V8. There's a key comment in the source:

// We currently don't optimize exponentiation based on feedback.

The commit that introduced this comment gives more background: the ** operator used to be "syntactic sugar" for Math.pow, and V8 actually implemented it by "desugaring" it to the latter; but with the introduction of BigInt support, it had to stop doing that. As usual, the first implementation aimed at correctness rather than maximum performance, and this first implementation is still in use today (which probably implies that this detail isn't particularly important in real-world code... otherwise someone would have complained before). That means that V8's optimizing compiler currently lacks the type feedback it would need to inline the exponentiation; instead it emits a call to a "built-in", which has to allocate a HeapNumber for the result it wants to return. But, being a reasonably clever optimizing compiler, it can propagate type information from elsewhere; that's why adding some other operation (such as (i+1) ** 2, or even (i+0) ** 2) has a beneficial impact in this case.

Summary: Don't be fooled by microbenchmarks. Drawing useful conclusions from a microbenchmark really requires inspecting what the engine is doing under the hood; otherwise there's an overwhelming chance that you're not measuring what you think you're measuring. Also, this is a nice illustration of the other problem with microbenchmarks: chances are that in your real code, something is different about the surrounding circumstances (e.g. you may be storing into arrays, or you may be doing additional operations that generate type feedback, etc), so the results from the microbenchmark likely aren't even applicable.

Why is i ** 2 slower than (i + 1) ** 2 in V8

1 Answers1

Why is i 2 slower than (i + 1) 2 in V8