2

The below excerpts refer to ECMAScript 2017.

11.8.4 String Literals, Note 1

A string literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. .... Any code points may appear in the form of an escape sequence.

11.8.4 String Literals, Syntax

Nonterminal symbol EscapeSequence has the following lexical grammar production:

EscapeSequence ::
    CharacterEscapeSequence
    0 [lookahead ∉ DecimalDigit]
    HexEscapeSequence
    UnicodeEscapeSequence

Nonterminal symbol CharacterEscapeSequence has the following lexical grammar production:

CharacterEscapeSequence ::
    SingleEscapeCharacter
    NonEscapeCharacter

11.8.4.3 Static Semantics: SV

Contains descriptions such as:

The SV of DoubleStringCharacter :: \ EscapeSequence is the SV of the EscapeSequence

Questions

  1. What is meant by escape sequence in Note 1? Trying to understand what an escape sequence actually does, rather than just the lexical grammar for it
  2. Why does CharacterEscapeSequence include NonEscapeCharacter?
  3. The descriptions in 11.8.4.3 Static Semantics: SV do not seem to follow the normal ECMAScript convention for lexical grammar productions. What is meant by those descriptions?
  4. Added question: Does Note 1 state that code points can be within quotes or alternatively after an escape sequence (such as backslash)? Is that what is meant by Any code points may appear in the form of an escape sequence?
Magnus
  • 6,791
  • 8
  • 53
  • 84

1 Answers1

2
  1. What is meant by escape sequence in Note 1?

    The EscapeSequence from your next question.

  2. Why does CharacterEscapeSequence include NonEscapeCharacter?

    Because invalid escapes just have their backslash ignored – for example, '\c' === 'c'. Backwards compatibility can't be broken.

  3. 11.8.4.3 Static Semantics: SV contains descriptions such as “The SV of DoubleStringCharacter :: \ EscapeSequence is the SV of the EscapeSequence”. Those lines do not follow the normal ECMAScript convention for lexical grammar productions. What is meant by those descriptions?

    It means that you should refer to the rule in the same section corresponding to the EscapeSequence. For example, if you had "\x20", the \x20 would be a DoubleStringCharacter consisting of \ and the EscapeSequence x20, which in turn is a HexEscapeSequence x HexDigit HexDigit, whose SV is given by

    The SV of HexEscapeSequence :: x HexDigit HexDigit is the code unit value that is (16 times the MV of the first HexDigit) plus the MV of the second HexDigit.

Ry-
  • 218,210
  • 55
  • 464
  • 476
  • Thank you, Ryan. Under Q3, is the spec saying that a lexical grammar production `DoubleStringCharacter :: \ EscapeSequence` has a string value? I am having a bit of trouble grasping what it is saying. Also, 11.8.4.2 seems to define `StringValue` as a `StringLiteral`. It says something I don't get either: `Return the String value whose elements are the SV of this StringLiteral.`. Any thoughts there? Thanks again. – Magnus Apr 02 '18 at 01:48
  • Just as a heads up: I added some clarifications and an extra question to the OP. – Magnus Apr 02 '18 at 02:08
  • @Magnus: Yes, it’s saying `DoubleStringCharacter` has a string value determined by what it matched. The string value of a literal is the string (the primitive type), “String value”, with elements determined by the concatenation of the SVs of the parts of the literal. – Ry- Apr 02 '18 at 02:12
  • @Magnus: And as for question 4: yes, it means that `"a"` (containing the code point `a`) can also be represented by an escape sequence as `"\x61"`. – Ry- Apr 02 '18 at 02:14
  • Thanks again, Ryan. What happens if I have the string literal "abc\u000Axyz"? According to 10.1 `escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator`. However, 11.8.4. (and your x61 example) seem to indicate that the escape sequence will be interpreted. – Magnus Apr 02 '18 at 02:51
  • The last confusing piece is the difference between: `string literal`, `String value`, and `SV`. It seems a `string literal` represents a value of type `String`. This value is called a `String value`, and is abbreviated `SV`. If the `SV` is longer than one code point, it would consist of several smaller `SV`s (down to one code point each). Is that right? – Magnus Apr 02 '18 at 03:02
  • @Magnus: The escape sequence will be interpreted *as a string value*; I don’t see any contradiction. An SV can be both longer than one UTF-16 code unit and longer than one code point. I don’t think it’s meaningful to say that it consists of smaller SVs. – Ry- Apr 02 '18 at 03:16
  • By interpreted as a string value, you mean it will literally just keep saying "\u000A" in the middle? It will not cause some kind of line break after abc, given that x000A is the line feed Unicode? – Magnus Apr 02 '18 at 03:22
  • @Magnus: No, I mean the line break is part of the string value. – Ry- Apr 02 '18 at 03:36
  • Thank you for all the help, Ryan. To avoid cluttering the post, I will post a new specific question on the two topics that still feel confusing. – Magnus Apr 02 '18 at 11:57