29

I tried next code (it shows similar results in Google Chrome and nodejs):

var t = new Array(200000); console.time('wtf'); for (var i = 0; i < 200000; ++i) {t.push(Math.random());} console.timeEnd('wtf');
wtf: 27839.499ms
undefined

I also runned next tests:

var t = []; console.time('wtf'); for (var i = 0; i < 400000; ++i) {t.push(Math.random());} console.timeEnd('wtf');
wtf: 449.948ms
undefined
var t = []; console.time('wtf'); for (var i = 0; i < 400000; ++i) {t.push(undefined);} console.timeEnd('wtf');
wtf: 406.710ms
undefined

But in Firefox all looks fine with the first variant:

>>> var t = new Array(200000); console.time('wtf'); ...{t.push(Math.random());} console.timeEnd('wtf');
wtf: 602ms

What happens in V8?

UPD * magically decreasing performance *

var t = new Array(99999); console.time('wtf'); for (var i = 0; i < 200000; ++i) {t.push(Math.random());} console.timeEnd('wtf');
wtf: 220.936ms
undefined
var t = new Array(100000); t[99999] = 1; console.time('wtf'); for (var i = 0; i < 200000; ++i) {t.push(Math.random());} console.timeEnd('wtf');
wtf: 1731.641ms
undefined
var t = new Array(100001); console.time('wtf'); for (var i = 0; i < 200000; ++i) {t.push(Math.random());} console.timeEnd('wtf');
wtf: 1703.336ms
undefined
var t = new Array(180000); console.time('wtf'); for (var i = 0; i < 200000; ++i) {t.push(Math.random());} console.timeEnd('wtf');
wtf: 1725.107ms
undefined
var t = new Array(181000); console.time('wtf'); for (var i = 0; i < 200000; ++i) {t.push(Math.random());} console.timeEnd('wtf');
wtf: 27587.669ms
undefined
Yurij
  • 1,530
  • 16
  • 30
  • Why do you want `new Array(200000)`? It doesn't do anything but set `length`. (It doesn't, for instance, pre-allocate any storage, because arrays aren't really arrays.) – T.J. Crowder Jun 06 '13 at 12:17
  • I wrote that code just for test. I wonder, why it shows so terrible performance. – Yurij Jun 06 '13 at 12:18
  • 8
    I've got to leave and have no time to answer, but the answer is simple. V8 falls back to sparse arrays in your first example, and optimizes to C like sequential memory arrays in your second. See https://code.google.com/p/v8/source/browse/trunk/src/array.js , – Benjamin Gruenbaum Jun 06 '13 at 12:18
  • And WHY it works like that, if it is doesn't do anything but set length? – Yurij Jun 06 '13 at 12:19
  • @yttrium: I think Benjamin answered that. You're (inadvertently) disabling an important performance optimization. – T.J. Crowder Jun 06 '13 at 12:19
  • @BenjaminGruenbaum, thank you for that link. But why they implement new Array() like this? – Yurij Jun 06 '13 at 12:23
  • 1
    @yttrium I've had to leave (mobile now), but I've asked a friend from the JS room to answer this question. Don't worry you'll get an answer soon, it makes sense though. – Benjamin Gruenbaum Jun 06 '13 at 12:23
  • @yttrium you spend a lot of time looking for the threshold but is there a reason you don't simply allocate your array as `[]` as you obviously want a "real" array (as is usually the case) ? – Denys Séguret Jun 06 '13 at 12:59
  • 1
    @dystroy, some comments above, I said that I wrote that code only for tests and want to understand what happens. – Yurij Jun 06 '13 at 13:18

2 Answers2

60

If you preallocate, do not use .push because you will create a sparse array backed by a hashtable. You can preallocate sparse arrays up to 99999 elements which will be backed by a C array, after that it's a hashtable.

With the second array you are adding elements in a contiguous way starting from 0, so it will be backed by a real C array, not a hash table.

So roughly:

If your array indices go nicely from 0 to Length-1, with no holes, then it can be represented by a fast C array. If you have holes in your array, then it will be represented by a much slower hash table. The exception is that if you preallocate an array of size < 100000, then you can have holes in the array and still get backed by a C array:

var a = new Array(N); 

//If N < 100000, this will not make the array a hashtable:
a[50000] = "sparse";

var b = [] //Or new Array(N), with N >= 100000
//B will be backed by hash table
b[50000] = "Sparse";
//b.push("Sparse"), roughly same as above if you used new Array with N > 0
Esailija
  • 138,174
  • 23
  • 272
  • 326
  • +1 I must upvote you as Benjamin is not here anymore (and you added more matter, too). – Denys Séguret Jun 06 '13 at 12:27
  • @dystroy I'm here, voting from mobile. Nice answer! How does the corresponding code in SpiderMonkey look? – Benjamin Gruenbaum Jun 06 '13 at 12:32
  • @BenjaminGruenbaum Couldn't say, I haven't looked at it ever. – Esailija Jun 06 '13 at 12:34
  • @dystroy, there is no such issue in firefox. – Yurij Jun 06 '13 at 12:34
  • Thank you. Why it such difference between pushes on array created with new Array(180000) and with new Array(181000)? – Yurij Jun 06 '13 at 12:55
  • 5
    @Esailija Actually what you write is not entirely correct. ArrayPush built-in tries to keep elements in fast mode if they were in fast mode. See the logic in https://code.google.com/p/v8/source/browse/trunk/src/builtins.cc#547 – Vyacheslav Egorov Jun 06 '13 at 12:56
  • 3
    @yttrium the difference happens due to how heuristics for going back into fast mode from dictionary mode work. if your array is in dictionary mode then every time when it needs to grow V8 checks whether it is dense enough and whether it can win space by using a continuous (C-like) array instead of dictionary. With 180000 as starting point heuristics hit fast and with 181000 heuristic hits very late. Hence the difference. Heuristic is here: https://code.google.com/p/v8/source/browse/trunk/src/objects.cc?r=14954#12483 – Vyacheslav Egorov Jun 06 '13 at 13:20
  • @VyacheslavEgorov thanks, good to have feedback from someone with true understanding. I am really just making best guesses here, v8 surprises me a lot. – Esailija Jun 06 '13 at 13:41
  • FWIW the 100K threshold was removed in 2015. The current threshold is 32M. – jmrk Sep 02 '22 at 10:50
7

Seemingly Unlimited Arrays [2020]

In modern V8, arrays can have any size now. You can use [] or new Array(len) in any way you like, even with random access.

In current Chrome (and I guess any V8 environment), Arrays can have a length of up to 2^32-1.

enter image description here

enter image description here

However, there are a few caveats:

Dictionary-mode Still Applies

As jmrk mentioned in the comments, arrays are not magical beings. Instead, smaller arrays (up to some threshold, apparently up to a few million elements now) are not sparse at all, and only appear to be sparse. They thus will use up the actual memory for all its elements. Once the threshold has been reached, arrays fall back into dictionary-mode.

They are easier to use now, but they internally still work the same as before.

You Need to Initialize an Empty Array

On the one hand, for loops work as intended, however, Array's builtin higher order functions (such as map, filter, find, some etc.) ignore unassigned elements. They require fill (or some other method of population) first:

const a = new Array(10);
const b = new Array(10).fill(0);

a.forEach(x => console.log(x)); // does nothing
b.forEach(x => console.log(x)); // works as intended

Judging from the Array constructor code, the SetLengthWouldNormalize function and the kMaxFastArrayLength constant, it can now support an almost arbitrarily large amount (currently capped at 32 million) of elements before resorting to dictionary mode.

Note, however that there are many more considerations at play now, as V8 optimization has become ever more complicated. This official blog post from 2017 explains that arrays can distinguish between 21 different kinds of arrays (or rather, array element kinds), and that - to quote:

"each of which comes with its own set of possible optimizations"

If "sparse arrays work, let's leave it at that!" is not good enough for you, I would recommend the following:

Original Post

If you pre-allocate an array with > 100000 elements in Chrome or Node (or more generally, in V8), they fall back to dictionary mode, making things uber-slow.

Thanks to some of the comments in this thread, I was able to track things down to object.h's kInitialMaxFastElementArray.

I then used that information to file an issue in the v8 repository which is now starting to gain some traction, but it will still take a while. And I quote:

I hope we'll be able to do this work eventually. But it's still probably a ways away.

Domi
  • 22,151
  • 15
  • 92
  • 122
  • (V8 developer here.) There seems to be some confusion here: "sparse array" == "dictionary mode" == "empty chunks don't use up any memory" is all the same thing. The only thing that changed (back in 2015) is that `new Array(N)` now creates non-sparse, non-dictionary backing stores for much bigger `N` (the threshold used to be 100K (not 10K as you claim), and is in the millions now). Other operations can still cause arrays to switch between dense and sparse backing stores at many different capacities. – jmrk Sep 02 '22 at 10:49
  • Thank you for the info - can you sauce ("source") that? – Domi Sep 02 '22 at 11:43
  • For what exactly do you want a source? – jmrk Sep 02 '22 at 15:36
  • @jmrk Is there any official documentation on that? Stuff we can link, so we can make sure people know where to go to verify and understand the information you provided? Or is this only in the source code and not officially documented? – Domi Sep 02 '22 at 16:34
  • 1
    The official documentation is at v8.dev/docs. We generally don't officially document internal heuristics and thresholds, because they can change and we wouldn't want anyone to rely on them. Of course you can read the source code if you want. The fact that `new Array(20000)` needs memory for 20K elements can be verified with a DevTools heap snapshot -- when the console says "empty x 10000" then that's just for your reading convenience, it doesn't mean that those elements don't need memory: the backing store is preallocated, so it does reserve 20K entries (80 kilobytes). – jmrk Sep 02 '22 at 18:52
  • @jmrk Thanks for the response. I double checked the link I myself provided above, and indeed, I forgot a `0` for `kInitialMaxFastElementArray` - hah! Also, apologies for using the word `source` ambiguously here. The first time I used it, I meant "source of reference", rather than source code. I shall add the caveat you mentioned! – Domi Sep 03 '22 at 05:52