0

From Douglas Crockford's JavaScript: The Good Parts, Chapter 2 Grammar

This chapter introduces the grammar of the good parts of JavaScript, presenting a quick overview of how the language is structured. We will represent the grammar with railroad diagrams.

The rules for interpreting these diagrams are simple:

  1. You start on the left edge and follow the tracks to the right edge.
  2. As you go, you will encounter literals in ovals, and rules or descriptions in rectangles.
  3. Any sequence that can be made by following the tracks is legal.
  4. Any sequence that cannot be made by following the tracks is not legal.
  5. Railroad diagrams with one bar at each end allow whitespace to be inserted between any pair of tokens. Railroad diagrams with two bars at each end do not.

The grammar of the good parts presented in this chapter is significantly simpler than the grammar of the whole language.

I have seen this answer on SO which basically reiterates what is presented in the book. So what is meant by token here?

Community
  • 1
  • 1
Geek
  • 26,489
  • 43
  • 149
  • 227

1 Answers1

1

Tokens are the basic atomic units of a grammar. In a typical programming language, tokens would include things like algebraic operators (+, *), statement separators ((, {, ;), identifiers, numeric and string values, and reserved words.

The concept of a "token" is somewhat bound up with the way a grammar is written and parsed. Some parsing schemes don't involve the concept of tokenization (packrat parsers for PEGs). However, in this case the use of a railroad diagram implies a traditional BNF (or BNF-like) grammar, complete with a set of tokens.

edit — actually, looking at that other question, the discussion there is actually about a token grammar itself — the token grammar for JSON. I suppose you could consider the elements of the character set to be "tokens" for that purpose. Anyway it should be clear that in those cases — the rules for what numbers and strings look like — spaces can't appear in the middle of those constructs. That is, 23 and 2 3 are not the same.

Outside of the bizarre situations around automatic semicolon insertion, I can't think of places in the JavaScript grammar that disallows spaces between tokens.

Pointy
  • 405,095
  • 59
  • 585
  • 614
  • what is *BNF* grammar? – Geek Jul 22 '13 at 17:55
  • @Geek oh it means "Backus-Naur Form". It's a notation for writing grammars. – Pointy Jul 22 '13 at 17:57
  • @Geek it would help a lot if you could provide more context from the part of the book where you found that. People use railroad diagrams for the "high level" view of language grammar, and also for the token grammar itself. – Pointy Jul 22 '13 at 18:02
  • I did put some more context. – Geek Jul 22 '13 at 18:07
  • @Geek OK, well for that specific case, I think it's safe to interpret "token" as meaning "any of the grammar elements explicitly mentioned in the railroad diagram". So if the diagram is about characters (like the JSON diagram), then a "token" is a character. If the diagram is about (for example) the grammar for a `for` loop, then "token" would refer to things like the `for` keyword, parentheses, variable names and other expression components, semicolons, and so on. – Pointy Jul 22 '13 at 18:30
  • @Geek all of the terminology here is basic stuff for programming language theory and design. There are many great books on the subject. – Pointy Jul 22 '13 at 18:35