2

I'm working on building a compiler (without using any tools -like lex or bison) for a C like language (a simpler one) and have gotten past the lexer and parser. I am not sure the way I am doing the parser is correct or not. Because, so far to do the parsing, ie to check if the syntax is correct or not , I haven't used linked lists at all. Basically, my parser looks like this: Suppose the syntax is -

<program> ::= <program_header> <program_body>
<program_header>::= program <identifier> is
<program_body> ::= (<declaration>;)*
begin
(<statement>;)*
end program

My program looks like this:

parser()
{
char *next_token;
next_token = get_token();
check_for_program(next_token);
}
check_for_program(next_token)
{
check_for_program_header(next_token);
if (header_found)
check_for_program_body();
}...

I basically have functions for all the non-terminals and call them at appropriate times and I am checking for the keywords by "strcmp". Is this method OK?

From this point, how to go about doing semantic analysis? Where should I start building the symbol table?

Any suggestion or pointer to think is great! Thank you very much

beanyblue
  • 171
  • 3
  • 7

2 Answers2

3

Well a common and rather simple way of doing it is to create a recursive descent parser i.e. create functions that correspond to your syntax (which you sort of seem to have started to do already):

e.g.

<program> ::= <program_header> <program_body>
<program_header>::= program <identifier> is
<program_body> ::= (<declaration>;)*

would correspond to something like

void program()
{
  program_header();
  program_body();
}

void program_header() 
{
   char* program_token = get_token();
   char* identifier = get_token();
   if (identifier==NULL) report_error();
   ...
}

void program_body()
{
   declaration();
   ...
}

and inside each function you put the semantic checks. You would need a symbol table, which either is a global construct if you don't want to handle scopes or have some kind of stack of symbol tables.

AndersK
  • 35,813
  • 6
  • 60
  • 86
0

Yes, this is one way of parsing, it's called Recursive descent parser. This is an informal way of parsing, it means you will need to change parsing code if you change the grammar.

There are also the formal parsing methods like LL and SLR, the formal methods have two advantages: you can prove that the parsing parses what is defined by your grammar (thats why they are called formal), and they are generic, you can code once and parse any compatible grammars.

fbafelipe
  • 4,862
  • 2
  • 25
  • 40