9

The lexical grammar of ECMAScript lists the following token classes for lexical analyzer (lexer):

InputElementDiv::
    WhiteSpace
    LineTerminator
    Comment
    CommonToken
    DivPunctuator
    RightBracePunctuator
InputElementRegExp::
    WhiteSpace
    LineTerminator
    Comment
    CommonToken
    RightBracePunctuator
    RegularExpressionLiteral
InputElementRegExpOrTemplateTail::
    WhiteSpace
    LineTerminator
    Comment
    CommonToken
    RegularExpressionLiteral
    TemplateSubstitutionTail
InputElementTemplateTail::
    WhiteSpace
    LineTerminator
    Comment
    CommonToken
    DivPunctuator
    TemplateSubstitutionTail

While I understand the nested classes like WhiteSpace, LineTerminator, I don't understand what the top level classes are: InputElementDiv, InputElementRegExp, InputElementRegExpOrTemplateTail and InputElementTemplateTail. Can anyone please clarify?

Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488
  • Each top level class represents any one of the productions that follows its `::`. Is that what you meant? Does this help? https://www.ecma-international.org/ecma-262/8.0/index.html#sec-lexical-and-regexp-grammars – spanky Aug 16 '17 at 20:28
  • Did you even read the note at the spec section you linked? – Bergi Aug 16 '17 at 21:23
  • 4
    @Bergi I'm doing a writeup. I think that part is hard to follow if you don't already know what it's saying. – loganfsmyth Aug 16 '17 at 21:31

1 Answers1

14

Definitely not obvious, I had my own struggle decoding all this at one point. The important note is in https://www.ecma-international.org/ecma-262/8.0/index.html#sec-ecmascript-language-lexical-grammar. Specifically:

There are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context that is consuming the input elements. This requires multiple goal symbols for the lexical grammar. The InputElementRegExpOrTemplateTail goal is used in syntactic grammar contexts where a RegularExpressionLiteral, a TemplateMiddle, or a TemplateTail is permitted. The InputElementRegExp goal symbol is used in all syntactic grammar contexts where a RegularExpressionLiteral is permitted but neither a TemplateMiddle, nor a TemplateTail is permitted. The InputElementTemplateTail goal is used in all syntactic grammar contexts where a TemplateMiddle or a TemplateTail is permitted but a RegularExpressionLiteral is not permitted. In all other contexts, InputElementDiv is used as the lexical goal symbol.

with the key part up front:

There are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context

Keep in mind that this is the lexical grammar definition, so all it aims to do is produce a set of tokens.

So let's break that down more. Consider a snippet like this:

/foo/g

With no context given, there are two ways to interpret this:

  1. DivPunctuator IdentifierName DivPunctuator IdentifierName

    "/" "foo" "/" "g"
    
  2. RegularExpressionLiteral

    "/foo/g"
    

From the standpoint of a lexer, it does not have enough information to know which of these to select. This means the lexer needs to have a flag like expectRegex or something, that toggles the behavior not just based on the current sequence of characters, but also on previously encountered tokens. Something needs to say "expect an operator next" or "expect a regex literal next".

The same is true for the following

}foo${
  1. RightBracePunctuator IdentifierName Punctuator

    "}" "foo$" "{"
    
  2. TemplateMiddle

    "}foo${"
    

A second toggle needs to be used for this case.

So that leaves us with a nice table of the 4 options that you've seen

| expectRegex | expectTemplate | InputElement                     |
| ----------- | -------------- | -------------------------------- |
| false       | false          | InputElementDiv                  |
| false       | true           | InputElementTemplateTail         |
| true        | false          | InputElementRegExp               |
| true        | true           | InputElementRegExpOrTemplateTail |

And the spec then covers when these flags toggle:

  • InputElementRegExpOrTemplateTail: This goal is used in syntactic grammar contexts where a RegularExpressionLiteral, a TemplateMiddle, or a TemplateTail is permitted.
  • InputElementRegExp: This goal symbol is used in all syntactic grammar contexts where a RegularExpressionLiteral is permitted but neither a TemplateMiddle, nor a TemplateTail is permitted.
  • InputElementTemplateTail: This goal is used in all syntactic grammar contexts where a TemplateMiddle or a TemplateTail is permitted but a RegularExpressionLiteral is not permitted.
  • InputElementDiv: This goal is used in all other contexts.
loganfsmyth
  • 156,129
  • 30
  • 331
  • 251
  • 1
    thanks a lot for you answer! I've just started looking at the spec from the compilation perspective. Hope you're active on stackoverflow and will help with deciphering :). I have a question though. Why is `CommonToken` in both `InputElementDiv` and `InputElementRegExp`? If you think I should better ask another question about that, let me know. Or maybe point me to where I should read about that. Appreciate! – Max Koretskyi Aug 17 '17 at 15:03
  • I guess the short answer is because it can be? There's no conflict in that case, so it doesn't make a difference. – loganfsmyth Aug 17 '17 at 15:20
  • 1
    I'm one of the maintainers of Babel :D – loganfsmyth Aug 17 '17 at 19:35
  • @loganfsmyth So where is it indicated when you need to select a concrete grammar from the listed? – MaximPro Apr 08 '18 at 04:43
  • @MaximPro I don't understand. Doesn't the text I quoted make that fairly clear? "There are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context that is consuming the input elements. This requires multiple goal symbols for the lexical grammar." – loganfsmyth Apr 08 '18 at 04:53
  • @loganfsmyth Well, I mean the specification, which specifically should say what grammar to choose. I did not find this in the specification, so I'm asking. – MaximPro Apr 08 '18 at 05:02
  • @MaximPro That text is from the specification. It's explaining in words specifically how to choose which to use. – loganfsmyth Apr 08 '18 at 18:37
  • @loganfsmyth I understand that when one grammar does not fit, another one is used. But how to understand when this or that context occurs when you need to change the grammar? From your example, I realized only one thing: when using `""` this creates a special context for RegExp or TemplateTail. Tell me, I do not understand when and under what conditions the grammar changes. For example: `This goal is used in syntactic grammar contexts where a RegularExpressionLiteral, a TemplateMiddle, or a TemplateTail is permitted.` What does it mean when permitted? When is it permitted and where? – MaximPro Apr 08 '18 at 22:12
  • "Tell me, I do not understand when and under what conditions the grammar changes" That is defined by the grammar of the language. So for instance `InputElementRegExp` or `InputElementRegExpOrTemplateTail ` would be used anywhere https://www.ecma-international.org/ecma-262/8.0/#prod-PrimaryExpression would be valid, since that is the syntaxtic context where a `RegularExpressionLiteral` would occur. Then between those two token grammars it would depend on if you were currently processing the `https://www.ecma-international.org/ecma-262/8.0/#prod-TemplateLiteral` grammar or not. – loganfsmyth Apr 08 '18 at 23:57
  • @loganfsmyth Can you give some examples. It's hard enough to imagine. I just never faced this. If this (InputElementRegExp or InputElementRegExpOrTemplateTail) will be used everywhere, then when will InputElementDiv be used? Need more information, it's really difficult to understand. – MaximPro Apr 09 '18 at 03:34