1

If I break down the compilation process for a C or C++ source file into these steps:

  • A: Preprocessing.
  • B: Tokenizing (collecting and listing keywords, identifiers, symbols, literals (strings, characters, numbers)).
  • C: Assembling collected tokens into a structured form, such as a tree.
  • D: Processing and verifying this structured form by analyzing its semantics.
  • E: Generating a list of instructions (eg: ASM).

My nomenclature questions are:

  1. Is syntax parsing all of #ABC? #BC? just #C?
  2. What terms should I use for #ABC? #BC? #C?
  3. What is lexing, here? Is it just #B?
  4. Is #D semantic parsing?
Chad
  • 19,219
  • 4
  • 50
  • 73
Aotium
  • 141
  • 1
  • 4

2 Answers2

1

While only the preprocessing stage is part of the language standard, most platforms divide the full build process into

  • preprocessing,
  • compiling,
  • assembling,
  • linking.

Compiling is the phase that subsumes all the "hard work", starting with lexing and parsing. Optimization comes in somewhere along the road.

Some modern systems that use some form of "link time optimization" may defer or repeat the compilation/assembly stage until all constituent object files have been processed once, but morally this is not much different from if you just concatenated all the input files of your project into one large file and compiled that.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
0

(A) is simple preprocessing: cut and paste
(B) is lexical analysis
(C) is syntax analysis [parsing]
(D) is semantics analysis [number 5 in the attached link]

Your A,B,C,D are basically the front end of a compiler, while your E is the final stage of its backend

amit
  • 175,853
  • 27
  • 231
  • 333
  • So could [ABC] be called textual processing or something? What about [BC]? I'm looking for appropriate names for the modules and submodules of a compiler I'm trying to design. – Aotium Feb 07 '12 at 23:39