6
#include<stdio.h>

int main()
{
  int a,b;
  a=a+b;
  printf("%d",a);
return 0;
}

what should be the output if this code is passed through a lexer

Bryan Oakley
  • 370,779
  • 53
  • 539
  • 685
Hick
  • 35,524
  • 46
  • 151
  • 243

2 Answers2

14

the lexer just tokenizes the stream to turn a stream of characters into a stream of tokens (that will be parsed with a parser later to obtain a full syntax tree). For your example you would obtain something like:

#include <stdio.h> (this is handled by preprocessor, not by lexer so it wouldn't exist)

int KEYWORD
main IDENTIFIER
( LPAR
) RPAR
{ LBRACE
int KEYWORD
a IDENT
, COMMA
b IDENT
; SEMICOL
a IDENT
= ASSIGN
a IDENT
+ PLUS
b IDENT
; SEMICOL
printf IDENT
( LPAR
"%d" STRING
, COMMA
a IDENT
) RPAR
; SEMICOL
return RETURN_KEYWORD
0 INTEGER
; SEMICOL
} RBRACE

Of course a lexer by itself can't do much, it can just split the source into smallest elements possible, checking for syntax errors (like misspelled keywords). You will need something that will combine them to give them a semantic meaning.

Just a side note: some lexers like to group similar kinds of tokens in just one (for example a KEYWORD token that contains all keywords) using a parameter associated with it, while others have a different token for every one like RETURN_KEYWORK, IF_KEYWORD and so on..

Jack
  • 131,802
  • 30
  • 241
  • 343
  • doesn't the pre-processor take out the entire #include , and essentially in-line the content of whatever file is included? – JustJeff Apr 18 '10 at 12:52
  • I'm not sure about C compilers, if they really inline it and lex it again or just uses the includes to know what it should be defined (without actually relexing the whole header), especially for standard defined (in __< >__ ) ones.. – Jack Apr 18 '10 at 12:55
  • There is a typo in the answer, it should be `RETURN_KEYWORD`, but I can't edit the answer, because I need at least change 6 characters. – ollydbg23 Feb 03 '17 at 13:35
4

Preprocessor directives will not be present in the input to the compiler as the preprocessor will consume them. So #include<stdio.h> will be replaced by the contents of the stdio.h file.

The resultant file will be broken down into tokens by the scanner according to the lexical rules which can be found here and will be passed to the parser as and when it ask for tokens.

codaddict
  • 445,704
  • 82
  • 492
  • 529