18

I want my language to have two features that make Python such a nicely formatted language:

  • One statement per line
  • Blocks begin with another indentation level and go on until that's ended

Can anyone give me a detailed hint on how to achieve that with flex/bison-like tools? Such a block feature forces the user to write readable code.

Mamun
  • 66,969
  • 9
  • 47
  • 59
Lanbo
  • 15,118
  • 16
  • 70
  • 147

3 Answers3

17

You could try to track the indentation level in the lexer, and add pseudo-tokens for indent and unindent. You will need to keep a stack of already seen indentation levels, and need to care about empty/comment-only lines differently. But I fear that at the end the lexer will become an unmaintainable mess and also you have some parse-specific state (the indentation stack) in your lexer.

Yangshun Tay
  • 49,270
  • 33
  • 114
  • 141
Rudi
  • 19,366
  • 3
  • 55
  • 77
12

Matt Might wrote an article on standalone parsers, with a way of handling significant whitespace using "unput":

http://matt.might.net/articles/standalone-lexers-with-lex/

(The example is half-way down the page.)

wlangstroth
  • 1,050
  • 6
  • 12
7

I think there is no way make a python-like syntax parser with ONLY lex/yacc, because lex/yacc can deal with Context Free Grammar only, but a python-like syntax is context sensitive.

The reason is, if you want to find whether a statement and the previous one is in the same block, you should let this statement knows the indentation of the previous one, that's the context.

I suggest you make some additional logic besides lex/yacc to accomplish that, and that won't be so hard. You could read codes here, in "grammar" modules.

The key is, let lex/yacc part parse single statement, with indentation level, and write something packing statements into blocks.

Community
  • 1
  • 1
neuront
  • 9,312
  • 5
  • 42
  • 71
  • Say, would it be any "easier" doing this with Haskell's `Parsec`? I heard it's more than just context-free. – Lanbo May 17 '11 at 17:04