3

I'm trying to produce a LALR grammar for a very simple language composed of assignments. For example:

foo = "bar"
bar = 42

The language should also handle list of values, for example:

foo = 1, 2, 3

But I also want to handle list on multiple lines:

foo = 1, 2
      3, 4

Trailing comma (for singletons and language flexibility):

foo = 1,
foo = 1, 2,

And obviously, both at the same time:

foo = 1,
      2,
      3,

I'm able to write a grammar with trailing comma or multi-line list, but not for both at the same time.

My grammar look like this:

content : content '\n'
        : content assignment
        | <empty>

assignment : NAME '=' value
           | NAME '=' list

value : TEXT
      | NUMBER

list : ???

Note: I need the '\n' in the grammar to forbid this kind of code:

foo
=
"bar"

Thanks by advance,

Antoine.

Antoine
  • 43
  • 4
  • You *could* look at how JavaScript, Go and Scala (and probably more, those were off the top of my head) infer semicolons. But be warned that this leads to gotchas (expressions extending over newlines) and quite a few programmers *hate* it. Perhaps you should add more restrictions (such as "only expressions inside parens/brakets/braces can extend over multiple lines", which is what Python does). –  Mar 13 '12 at 22:23
  • Actually, my language is not a programming language but a configuration format, so there is not expression. I considered adding something around the list, but I prefer without if I can. – Antoine Mar 13 '12 at 22:35
  • Could you provide a link to your parsing code so that we can play with your grammar and see what it works and what not? – Rik Poggi Mar 14 '12 at 08:30
  • Yes, it's available here: https://gist.github.com/ed4b5152a707b0ad2696 . You can just launch the script to call the test_parser function and print the list2_ content (or a parsing error :)). – Antoine Mar 14 '12 at 09:25

2 Answers2

2

It looks like your configuration language is essentially free form. I would forget about making newline a token in the grammar. If you want the newline restrictions, you can hack it as some lexical tie-in rules, whereby the parser calls a little API added to the lexer to inform the lexer about where it is in the grammar, and the lexer can decide whether to accept newlines or reject them with an error.

Try this grammar.

%token NAME NUMBER TEXT

%%

config_file : assignments
            | /* empty */
            ;

assignments : assignment
            | assignments assignment
            ;

assignment : NAME '=' values comma_opt

comma_opt : ',' | /* empty */;

values : value
       | values ',' value
       ;

value : NUMBER | TEXT ;

It builds for me with no conflicts. I didn't run it, but a casual reading of y.output looks like the transitions are sane.

This grammar, of course, allows

foo = 1, 2, 3, bar = 4, 5, 6 xyzzy = 7 answer = 42

without additional communication with the lexer.

Your restrictions mean that newlines are only allowed in the values. Two NAME tokens must never appear on the same line, and the = must appear on the same line as the preceding NAME (and probably the first value must also).

Basically when the parser scans the first value, it can tell the lexer "values are being scanned now, turn on the admission of newlines". And then when the comma_opt is reduced, this can be turned off again. When comma_opt is reduced, the lexer may have already read the NAME token of the next assignment, but it can check that this occurs on a different line from the previous NAME. You will want your lexer to keep track of an accurate line count anyway.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • I've tested your idea, maybe not exactly as you described, but it work :-). Thanks! For the record, I've added a check which raise an exception if the "current line" recorded in the parser is the same as the current line in the lexer, and which set the "current line" otherwise. This check is called in each assignment or section rule. – Antoine Mar 15 '12 at 19:58
0

I don't really have much experience with this, but would this work?

listvalue : value ,
          | value '\n'
          | value , '\n'

list : listvalue list
aquavitae
  • 17,414
  • 11
  • 63
  • 106