0

I have a performance critical queue container that does a lot of random inserts/deletes. In my container class, I have a fixed size Uint32Array called updates and in my insert logic, I need to compare a value to every value in this update array.

Originally, I did it simply like this:

const u = this.updates;
for (let i = 0; i < u.length; i++) bucket_offset += bi >= u[i];

Then, just as I was putting finishing touches on the container and just screwing around, I tried unrolling said loop

const u = this.updates;
bucket_offset += (bi >= u[0]) + (bi >= u[1]) + (bi >= u[2]) + (bi >= u[3]) + (bi >= u[4]) + (bi >= u[5]) + (bi >= u[6]) + (bi >= u[7]) + (bi >= u[8]) + (bi >= u[9]) + (bi >= u[10]) + (bi >= u[11]) + (bi >= u[12]) + (bi >= u[13]) + (bi >= u[14]) + (bi >= u[15]);

And turns out this is around 10x faster on chrome, making the whole insert ~30% faster.

After that revelation, I'm looking for a way to make the VM understand that its okay to unroll that loop, after all, this.updates.length is a constant.

At the end of the day, it can stay the way it is but I'd prefer a loop since it would just look nicer. Any ideas?

My main target is the Chrome browser running on V8.

Jonas Wilms
  • 132,000
  • 20
  • 149
  • 151
user81993
  • 6,167
  • 6
  • 32
  • 64
  • 1
    How does `for (let i = 0; i < 15; i++)` perform? – Jonas Wilms Jan 23 '22 at 01:41
  • These optimizations are engine specific, so which engine are you optimizing for? – Jonas Wilms Jan 23 '22 at 01:43
  • Any luck here? https://stackoverflow.com/questions/13669910/unrolling-javascript-loops – Spencer May Jan 23 '22 at 02:05
  • Is that loop equivalent to `bucket_offset + = u.filter(y => y < bi).length` ? – Eineki Jan 23 '22 at 02:08
  • @JonasWilms I think you meant 16, but it performs the same as .length. This is 99% meant to run in the chrome browser (as a part of a game), even though I do look at other browsers, their performance for this and various other algorithms I use is.. disappointing so for now I'm just focusing on chrome. – user81993 Jan 23 '22 at 04:24
  • @Eineki it is equivalent to y => bi >= y, also I did try out its performance and its worse than the loop – user81993 Jan 23 '22 at 04:33
  • 2
    @jmrk wrote [this answer](https://stackoverflow.com/a/66082249/367865). Maybe they have an idea as to why manually unrolling this loop results in much faster execution and why the engine either isn't unrolling the loop or is unrolling the loop but getting results slower than the manually unrolled version. – Ouroborus Jan 23 '22 at 07:11
  • From the duplicate, it seems as V8 did not unroll loops in 2021, so if nothing changed dramatically the answer is probably _no_. It surprised my though, thought V8 would unroll, at least in trivial cases .... – Jonas Wilms Jan 23 '22 at 11:18
  • Since you are creating the `Uint32Array` you know its length. Loop over the same constant you passed to the `Uint32Array` constructor. – Akxe Jan 23 '22 at 11:19
  • @JonasWilms the answer on the duplicate states that "*we currently have an ongoing project to unroll more loops; […] we're still fine-tuning the heuristics for when it should kick in*". So it might actually be worth following up what came out of that project, and maybe to report a bug that the heuristics should detect this particular case. – Bergi Jan 23 '22 at 14:30
  • @Bergi if I read https://github.com/v8/v8/blob/lkgr/src/compiler/loop-unrolling.h correctly, then TurboFan unrolls loops with less than 5 iterations, though the commit message suggests it's only available for wasm? – Jonas Wilms Jan 23 '22 at 17:47
  • @bergi jmrk confirmed that this is still WASM only, so the answer stays _no_ (for now). – Jonas Wilms Jan 23 '22 at 20:08

0 Answers0