6

I want to write a compiler for a language that denotes program blocks with white spaces, like in Python. I prefer to do this in Python, but C++ is also an option. Is there an open-source lexer that can help me do this easily, for example by generating INDENT and DEDENT identifiers properly like the Python lexer does? A corresponding parser generator will be a plus.

Elektito
  • 3,863
  • 8
  • 42
  • 72
  • This question is from almost ten years ago. I can't say I remember much. From the description though, this question prefers python and the other one C. – Elektito May 16 '21 at 16:18
  • To provide context for the above out of the blue comment, there was a question about this being a duplicate of https://stackoverflow.com/questions/1413204/how-to-use-indentation-as-block-delimiters-with-bison-and-flex. The person closing this has deleted their question. FWIW, I don't think this is a duplicate. – Elektito May 18 '21 at 11:53

2 Answers2

4

LEPL is pure Python and supports offside parsing.

Cat Plus Plus
  • 125,936
  • 27
  • 200
  • 224
1

If you're using something like lex, you can do it this way:

^[ \t]+              { int new_indent = count_indent(yytext);
                       if (new_indent > current_indent) {
                          current_indent = new_indent;
                          return INDENT;
                       } else if (new_indent < current_indent) {
                          current_indent = new_indent;
                          return DEDENT;
                       }
                       /* Else do nothing, and this way
                          you can essentially treat INDENT and DEDENT
                          as opening and closing braces. */
                     }

You may need a little additional logic, for example to ignore blank lines, and to automatically add a DEDENT at the end of the file if needed.

Presumably count_indent would take into account converting tabs to spaces according to a tab-stop value.

I don't know about lexer/parser generators for Python, but what I posted should work with lex/flex, and you can hook it up to yacc/bison to create a parser. You could use C or C++ with those.

parkovski
  • 1,503
  • 10
  • 13
  • 5
    You have to be careful with this because you may need to add multiple DEDENT tokens at the start of a line, not just one. Python suggests having a stack to maintain this. – templatetypedef Aug 01 '11 at 19:43