11

Help! I'm learning to love Javascript after programming in C# for quite a while but I'm stuck learning to love the iterable protocol!

Why did Javascript adopt a protocol that requires creating a new object for each iteration? Why have next() return a new object with properties done and value instead of adopting a protocol like C# IEnumerable and IEnumerator which allocates no object at the expense of requiring two calls (one to moveNext to see if the iteration is done, and a second to current to get the value)?

Are there under-the-hood optimizations that skip the allocation of the object return by next()? Hard to imagine given the iterable doesn't know how the object could be used once returned...

Generators don't seem to reuse the next object as illustrated below:

function* generator() {
  yield 0;
  yield 1;
}

var iterator = generator();
var result0 = iterator.next();
var result1 = iterator.next();

console.log(result0.value) // 0
console.log(result1.value) // 1

Hm, here's a clue (thanks to Bergi!):

We will answer one important question later (in Sect. 3.2): Why can iterators (optionally) return a value after the last element? That capability is the reason for elements being wrapped. Otherwise, iterators could simply return a publicly defined sentinel (stop value) after the last element.

And in Sect. 3.2 they discuss using Using generators as lightweight threads. Seems to say the reason for return an object from next is so that a value can be returned even when done is true! Whoa. Furthermore, generators can return values in addition to yield and yield*-ing values and a value generated by return ends up as in value when done is true!

And all this allows for pseudo-threading. And that feature, pseudo-threading, is worth allocating a new object for each time around the loop... Javascript. Always so unexpected!


Although, now that I think about it, allowing yield* to "return" a value to enable a pseudo-threading still doesn't justify returning an object. The IEnumerator protocol could be extended to return an object after moveNext() returns false -- just add a property hasCurrent to test after the iteration is complete that when true indicates current has a valid value...

And the compiler optimizations are non-trivial. This will result in quite wild variance in the performance of an iterator... doesn't that cause problems for library implementors?

All these points are raised in this thread discovered by the friendly SO community. Yet, those arguments didn't seem to hold the day.


However, regardless of returning an object or not, no one is going to be checking for a value after iteration is "complete", right? E.g. most everyone would think the following would log all values returned by an iterator:

function logIteratorValues(iterator) {
  var next;
  while(next = iterator.next(), !next.done)
    console.log(next.value)
}

Except it doesn't because even though done is false the iterator might still have returned another value. Consider:

function* generator() {
  yield 0;
  return 1;
}

var iterator = generator();
var result0 = iterator.next();
var result1 = iterator.next();

console.log(`${result0.value}, ${result0.done}`) // 0, false
console.log(`${result1.value}, ${result1.done}`) // 1, true

Is an iterator that returns a value after its "done" is really an iterator? What is the sound of one hand clapping? It just seems quite odd...


And here is in depth post on generators I enjoyed. Much time is spent controlling the flow of an application as opposed to iterating members of a collection.


Another possible explanation is that IEnumerable/IEnumerator requires two interfaces and three methods and the JS community preferred the simplicity of a single method. That way they wouldn't have to introduce the notion of groups of symbolic methods aka interfaces...

Christopher King
  • 1,034
  • 1
  • 8
  • 21
  • Can you link to the spec where it says that a *new* object needs to be returned? – Felix Kling Dec 22 '18 at 10:10
  • You'll likely no get an answer about specific language design decision here, since the people who work on the spec are not here. You should reach out to them directly. – Felix Kling Dec 22 '18 at 10:11
  • Originally, for my own edification, I ported parts of the C# LINQ extension family to Javascript but using a `moveNext()` and `current` iteration protocol. I'm considering re-implementing them using `iterable` so I can make use of generators. But now I realized that would mean allocating an object for each iteration. And I just couldn't understand why JS would choose such a memory heavy iteration approach. But maybe you answered my question: when one does chooses Javascript one should worry about under-the-hood! Otherwise, choose C! – Christopher King Dec 22 '18 at 10:11
  • MDN is not an authoritative source. Anybody can edit the content there. But even this text doesn't say that a **new** object has to be returned in every iteration. – Felix Kling Dec 22 '18 at 10:14
  • @FelixKling [Here](http://www.ecma-international.org/ecma-262/9.0/#sec-createiterresultobject) the spec creates a new object. Of course you can only test this by storing old result objects and comparing them, which you normally wouldn't do. – Bergi Dec 22 '18 at 10:16
  • @FelixKling I think MDN [does say](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols#The_iterator_protocol) that a new object `{done: boolean, value: any}` must be returned. – Nurbol Alpysbayev Dec 22 '18 at 10:16
  • @Bergi: That's what I was looking for. – Felix Kling Dec 22 '18 at 10:20
  • @NurbolAlpysbayev: As far as I can tell it only says it needs to return an object, not that has to return a *different* object in each iteration. – Felix Kling Dec 22 '18 at 10:23
  • 1
    @Bergi: Actually, that only describes the behavior of built-in iterators. The protocol itself doesn't seem to require a new object in each iteration. – Felix Kling Dec 22 '18 at 10:24
  • @FelixKling Hmm, you are right, I didn't think about the same object can be mutated and returned again. – Nurbol Alpysbayev Dec 22 '18 at 10:25
  • @FelixKling Oh, you mean the protocol requirements. Right then. – Bergi Dec 22 '18 at 10:26
  • 1
    FWIW, here is an example that reuses the result object: https://jsfiddle.net/wp82n07o/ . The [specification of the protocol](https://www.ecma-international.org/ecma-262/9.0/#sec-operations-on-iterator-objects) doesn't seem to require that a *different* object is returned in each iteration (as far as I can see). So it seems you can get away with only allocating one. However, as I have mentioned before, I would reach out to the people from the TC39 committee if you want clarification on that. – Felix Kling Dec 22 '18 at 10:33
  • Yes, I could see reusing the object. However the generator protocol doesn't reuse the object so I'm guessing I shouldn't do that either... – Christopher King Dec 22 '18 at 10:34
  • @FelixKling cool! So simple and demonstrative. I was too lazy to do that... – Nurbol Alpysbayev Dec 22 '18 at 10:36
  • @Christopher: Yep, that's a good point. Maybe I'm missing something in the spec... or they didn't consider your example as a realistic use case. – Felix Kling Dec 22 '18 at 10:37
  • 2
    @FelixKling Here's some discussion: https://esdiscuss.org/topic/is-an-iterator-allowed-to-reuse-the-same-state-object, https://esdiscuss.org/topic/iterator-next-method-returning-new-object. Also I found that reusing the object makes escape analysis harder for the compiler... – Bergi Dec 22 '18 at 10:45
  • @Bergi good stuff! – Felix Kling Dec 22 '18 at 10:54

2 Answers2

10

Are there under-the-hood optimizations that skip the allocation of the object return by next()?

Yes. Those iterator result objects are small and usually short-lived. Particularly in for … of loops, the compiler can do a trivial escape analysis to see that the object doesn't face the user code at all (but only the internal loop evaluation code). They can be dealt with very efficiently by the garbage collector, or even be allocated directly on the stack.

Here are some sources:

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Cool! I could see at the call site the compiler could figure out the object is short lived but how does the callee/generator know not to allocate the object? Or does the compiler generate two versions of the callee/generator, one that returns an object and ones that uses a no-allocation-protocol? – Christopher King Dec 22 '18 at 10:16
  • 1
    @ChristopherKing Tbh, I don't know. What I wrote above is just speculation, it's what we could reasonably expect from a compiler, to reason why the protocol as specified is not as bad as you made it sound. I'm gonna search for authoritative sources on whether engines actually implement these optimisations... – Bergi Dec 22 '18 at 10:20
  • Thanks. The runtime could see it's short lived and so put it into a gen0 heap but still -- an object is allocated on the heap for each iteration of the loop... – Christopher King Dec 22 '18 at 10:22
  • @ChristopherKing JS operates millions of objects per second or something. So you are concerned about micro/nano-optimizations, I guess. – Nurbol Alpysbayev Dec 22 '18 at 10:24
  • I guess I am. Maybe I shouldn't me. Maybe to achieve Javascript Zen, I must release my attachment to memory allocations... – Christopher King Dec 22 '18 at 10:29
  • @ChristopherKing It depends on what areas are you like to work, for example Petka Antonov is a performance fanatic and created a beautiful, super performant Promise library called Bluebird. I think there is actually lack of such people in JS, so think again :) – Nurbol Alpysbayev Dec 22 '18 at 10:32
  • Awesome! Well, I surely hope a JS performance fanatic will chime in and let us know why JS choose to allocate an object per loop! In the mean time, I'll check out Bluebird. Thanks! – Christopher King Dec 22 '18 at 10:37
  • @ChristopherKing Bluebird is mostly history and an example of how native stuff (native JS Promises) can be much slower (100 times IIRC?). Nowadays I think native promises are as fast. But you definitely should read about [that story](https://softwareengineering.stackexchange.com/questions/278778/why-are-native-es6-promises-slower-and-more-memory-intensive-than-bluebird), very interesting! – Nurbol Alpysbayev Dec 22 '18 at 10:39
  • 2
    @ChristopherKing They chose to make the iteration protocol *simple* in the first place, only specifying how it should work, and trusted on engine implementations to optimise. (The simpler the semantics in the spec, the more trivial the written code, the easier to optimise for a compiler). Notice that the ECMAScript specification does not talk about allocations/deallocations at all. – Bergi Dec 22 '18 at 10:53
  • Wow Bergi! How did you _find_ all this stuff!! – Christopher King Dec 22 '18 at 11:12
  • @ChristopherKing Searching with the right keywords :-) `es-discuss` (where most architectural decisions are made/explained), `v8` (which has a clever optimising compiler and blogs about it), plus `iterator result` and `performance`. – Bergi Dec 22 '18 at 11:41
  • The last points you mention are false. I made a benchmark and let it run in node.js v16 and v20. Results: 1. Sync generator functions are 3-4 x _slower_ than a pure iterator. I was surprised by this, too. - 2. Creating one result object per iterator and only re-set the value for each next() call is again 2-3 x faster than mindless return { done: false, value }. 3. Even this optimized implementation is still 3-4 x slower than a normal for loop. – Andi Aug 06 '23 at 04:21
  • @Andi I guess it depends a lot on what the particular iterator actually does. Can you share your code? Sounds very interesting! Notice that the last point actually refers to collection iterators like `arr.values()`, not to generator functions. – Bergi Aug 06 '23 at 09:40
  • 1
    @Bergi https://jsfiddle.net/AndiTR/5oz9aw8y/12/ - Browsers (Chrome v109, FF 115) confirm my results: 1. A (raw) iterator chain (with reused result objects) is 8 x faster than a generator iterator. 2. A chain (sequence + map) with my direct protocol is 2.5 to 4 x faster than the regular protocol. – Andi Aug 08 '23 at 21:26
  • @Bergi Updated link to the latest version: https://jsfiddle.net/AndiTR/5oz9aw8y/latest – Andi Aug 09 '23 at 09:08
1

Bergi answered already, and I've upvoted, I just want to add this:

Why should you even be concerned about new object being returned? It looks like:

{done: boolean, value: any}

You know, you are going to use the value anyway, so it's really not an extra memory overhead. What's left? done: boolean and the object itself take up to 8 bytes each, which is the smallest addressable memory possible and must be processed by the cpu and allocated in memory in a few pico- or nanoseconds (I think it's pico- given the likely-existing v8 optimizations). Now if you still care about wasting that amount of time and memory, than you really should consider switching to something like Rust+WebAssembly from JS.

Matthias
  • 13,607
  • 9
  • 44
  • 60
Nurbol Alpysbayev
  • 19,522
  • 3
  • 54
  • 89
  • 2
    The overhead is the object itself... But maybe you're right. Maybe one simply shouldn't worry about memory pressure if one is coding Javascript. Still, it just seems that allocating an object for _each iteration_ is memory gluttony even for a dynamic language! – Christopher King Dec 22 '18 at 10:26
  • @ChristopherKing https://jsfiddle.net/AndiTR/5oz9aw8y/latest – Andi Aug 09 '23 at 09:08