0

I'm writing an intepreter for a new programming language. The language's syntax is very simple and the "system library" commands are treated as simple identifiers (even if is no special construct, but a function like everything else - only pre-defined internally). And no, this is not yet-another-one of the 1 million Lisp's out there.

The question is:

Should I have the Lexer catch them, or should I do it in the AST-construction code?

What I've done so far:

I tried recognizing all of them in my Lexer script, and they are a lot already - over 200. I send the same token back to Bison (SYSTEM_CMD) only with a different value (basically a numeric index pointing to the array of system commands where they are all stored).

As an approach, I think this makes it much faster than having to look up every single one of them in a hash and see if it's a system command.

The thing is the Lexer is getting quite huge (in term of resulting binary filesize I mean) rather fast. And I obviously don't like it.


Given that my focus is something both lightning-fast (I'm already quite good with that) and small enough to be embedded, what would be the most recommended approach?

Dr.Kameleon
  • 22,532
  • 20
  • 115
  • 223
  • You don't identify them with either. You leave all that until a pseudo-link step, which is where you identify all functions and what they should do. Otherwise you bake them into the language, which leads to difficulties later on when you want to add more; and you bloat the language. At best you should consider them as Pascal does, as part of an imaginary enclosing scope whose content isn't specified by the language itself, or recognized by it. – user207421 Nov 20 '19 at 07:59
  • @user207421 , Thank you. do you have any references to learn more about this, please post – Venkatesh Nandigama Nov 20 '19 at 13:44
  • What do you consider "quite huge"? I have a simple lexer with more than 500 symbols, which is about 70k (compiled with -Os). – rici Nov 21 '19 at 05:45
  • @VenkateshNandigama The Pascal User Manual and Report. – user207421 Nov 25 '19 at 00:56
  • 1
    @rici Yeah. I had a lexer for Cobol-85, which has > 400 reserved words, all separately recognized as patters, and it wasn't huge, with maximum *flex* compression enabled. – user207421 Nov 25 '19 at 00:57

0 Answers0