Is there a mistake in ECMAScript spec relating to Unicode code points?

Question

When a stream of code points is to be parsed as an ECMAScript Script or Module, it is first converted to a stream of input elements by repeated application of the lexical grammar;

However, in Static Semantics: ParseText it says:

The abstract operation ParseText takes arguments sourceText (a sequence of Unicode code points) and goalSymbol (a nonterminal in one of the ECMAScript grammars). It performs the following steps when called:

The algorithm of Static Semantics: ParseText doesn't mention a step where we applying the Lexical Grammar that is turning a sequence of Unicode code points into an input elements.

So from my understanding, it seems like that algorithm skips the Lexical Grammar phase, and just constructs a parse tree from the Unicode code points.

So it basically contradicts The Syntactic Grammar statement I've attached above.

Because according to the Syntactic Grammar we construct a parse tree from input elements that we got from the Lexical Grammar and not from Unicode code points.

Is it a mistake or maybe I've misunderstood something?

This might be to "opinion-based" to stay open. Maybe better to raise in TC39's chat channels. Sounds to me like the Syntactic Grammar is being overly explicit in its description, but maybe not strictly wrong. The spec is aimed at specifying the behavior of the language, not describing how code must be implemented. From the standpoint of the language itself, there's no difference whether you parse the unicode into input elements then to parse nodes, or directly, so the spec really doesn't have to care too much. — loganfsmyth, Feb 05 '22 at 06:24

score 2 · Answer 1 · answered Feb 05 '22 at 13:29

It's true that ParseText doesn't mention applying the lexical grammar. But note that it also doesn't mention applying the syntactic grammar. What it says is:

Attempt to parse sourceText using goalSymbol as the goal symbol, [...]

And that's meant to cover any (spec-compliant) activities that the implementation needs to perform to come up with a Parse Node (an instance of goalSymbol that matches sourceText). That will definitely involve applying the lexical grammar to Unicode code points, and might also (if the goalSymbol is from the syntactic grammar) involve applying the syntactic grammar to input elements.

Is there a mistake in ECMAScript spec relating to Unicode code points?

1 Answers1