2

The below excerpts refer to ECMAScript 2017.

11.8.4.2 Static Semantics: StringValue

StringLiteral::
    "DoubleStringCharactersopt"
    'SingleStringCharactersopt'

1. Return the String value whose elements are the SV of this StringLiteral.

11.8.4.3 Static Semantics: SV

A string literal stands for a value of the String type. The String value (SV) of the literal is described in terms of code unit values contributed by the various parts of the string literal.

Questions

In the excerpts above, the following terms appear:

  1. string literal
  2. Nonterminal symbol StringLiteral
  3. String value
  4. SV

Could someone help explain the difference between these terms?

Also, what does the last sentence in 11.8.4.2 mean?

Magnus
  • 6,791
  • 8
  • 53
  • 84

2 Answers2

4

A string literal is the thing that you, a human writing or reading code, can recognize as the sequence "..." or '...'

The token StringLiteral is a nonterminal in the formal grammar of EMCAScript that can be replaced by a terminal that is an actual string literal.

A string value is the semantic content of a string literal. The spec says

The String value (SV) of the literal is ...

Therefore, we may be sure that a string literal has a string value: the string value of some string literal is a collection of code unit values.

The identifier SV appears to be shorthand for (and used interchangeably with) "string value".


Also, what does the last sentence in 11.8.4.2 mean?

Every nonterminal "returns" some value when it is evaluated. The line

Return the String value whose elements are the SV of this StringLiteral.

simply means that when the parser finds a StringLiteral in the text of a program, the result of parsing that nonterminal is the string value (i.e., collection of code unit values) associated with the just-parsed StringLiteral.

apsillers
  • 112,806
  • 17
  • 235
  • 239
  • Ah, thanks. Is this right: A lexer turns the source code into valid tokens, which is only terminal symbols, using the specified Lexical Grammar (normally represented with regular expressions). Then, the parser evaluates those tokens and turns them into machine code (or similar)? – Magnus Apr 03 '18 at 17:39
  • @Magnus That's probably right, but I'm not an expert on parsing. The grammar goes pretty far down (to the character-by-character level), but doesn't deal with white space (e.g., `1+1` is the same as `1 + 1`) so a lexer would probably eliminate white space issues via tokenization first. And yes, a parser parses the tokens and sometimes performs "static semantics" steps that consist of operations and error rules that are beyond simple grammar rules (see https://www.ecma-international.org/ecma-262/8.0/index.html#sec-static-semantic-rules) – apsillers Apr 03 '18 at 18:33
3

A lot of the terminology you're looking at is really of value to JavaScript platform maintainers; in practical terms, you almost certainly already know what a "string" is. The other terms are useful for reading the spec.

The term StringLiteral refers to a piece of JavaScript source code that a JavaScript programmer would look at and call "a string"; in other words, in

let a = "hello world";

the StringLiteral is that run of characters on the right side of the = from the opening double-quote to the closing double-quote. It's a "nonterminal" because it's not a "terminal" symbol in the definition of the grammar. Language grammars are built from terminal symbols at the lowest level and non-terminals to describe higher-level subsections of a program. The bold-faced double-quote characters you see in the description of a double-quoted string are examples of terminal symbols.

The term StringValue refers to an internal operation that applies to several components of the grammar; for StringLiteral it has the fairly obvious definition you posted. Semantic rules are written in terms of non-terminals that make up some grammar concept.

The term String value or SV is used for describing the piece-by-piece portions of a string.

The JavaScript spec is particularly wacky with terminology, because the language committee is stuck with describing semantics that evolved willy-nilly in the early years of language adoption. Inventing layers of terminology with much apparent redundancy is a way of coping with the difficulty of creating unambiguous descriptions of what bits of code are supposed to do, down to the last detail and weird special case. It's further complicated by the fact that (for reasons unknown to me) the lexical grammar is broken down in as much excruciating detail as are higher-level constructs, so that really compounds the nit-picky feel of the spec.

An example of when knowing that expanse of terminology would be useful might be an explanation of why it's necessary to "double-up" on backslashes when building a regular expression from a string literal instead of a regular expression literal. It's clear that a call to the RegExp constructor:

var r = new RegExp("foo\\.bar");

has an expression consisting of just one StringLiteral. To make the call to the constructor, then, the semantic rules for that operation will at some point call for getting the StringValue (and thus SV) of that literal, and those rules contain the details for every piece of the literal. That's where you come across the fact that the SV semantics have rules for backslashes, and in particular one that says two backslashes collapse to one.

Now I'm not saying that that explanation would be better than a simple explanation, but it's explicitly clear about every detail of the question.

Pointy
  • 405,095
  • 59
  • 585
  • 614