I want to write a compiler for a language that denotes program blocks with white spaces, like in Python. I prefer to do this in Python, but C++ is also an option. Is there an open-source lexer that can help me do this easily, for example by generating INDENT and DEDENT identifiers properly like the Python lexer does? A corresponding parser generator will be a plus.
Asked
Active
Viewed 2,348 times
6
-
This question is from almost ten years ago. I can't say I remember much. From the description though, this question prefers python and the other one C. – Elektito May 16 '21 at 16:18
-
To provide context for the above out of the blue comment, there was a question about this being a duplicate of https://stackoverflow.com/questions/1413204/how-to-use-indentation-as-block-delimiters-with-bison-and-flex. The person closing this has deleted their question. FWIW, I don't think this is a duplicate. – Elektito May 18 '21 at 11:53
2 Answers
1
If you're using something like lex, you can do it this way:
^[ \t]+ { int new_indent = count_indent(yytext);
if (new_indent > current_indent) {
current_indent = new_indent;
return INDENT;
} else if (new_indent < current_indent) {
current_indent = new_indent;
return DEDENT;
}
/* Else do nothing, and this way
you can essentially treat INDENT and DEDENT
as opening and closing braces. */
}
You may need a little additional logic, for example to ignore blank lines, and to automatically add a DEDENT at the end of the file if needed.
Presumably count_indent would take into account converting tabs to spaces according to a tab-stop value.
I don't know about lexer/parser generators for Python, but what I posted should work with lex/flex, and you can hook it up to yacc/bison to create a parser. You could use C or C++ with those.

parkovski
- 1,503
- 10
- 13
-
5You have to be careful with this because you may need to add multiple DEDENT tokens at the start of a line, not just one. Python suggests having a stack to maintain this. – templatetypedef Aug 01 '11 at 19:43