0

I've been working through "Modern Compiler Implementation in ML", converting SML to OCaml as I go. The book defines a language called Tiger that has a let ... in ... end syntax for declaring types, variables, and functions in scope for a given expression. Additionally, adjacent declarations of the same kind should be grouped together to allow for mutual recursion.

I've tried to represent this is Menhir with the following grammar snippet:

%right FUNCTION TYPE

.
.
.

decs: l = list(dec) { l }

dec:
  | l = nonempty_list(tydec) { A.TypeDec l }
  | v = vardec { v }
  | l = nonempty_list(fundec) { A.FunctionDec l }

tydec:
  | TYPE; name = ID; EQUAL; ty = ty {
      A.{
        type_name = Symbol.symbol name;
        type_ty = ty;
        type_pos = Position.make $startpos $endpos
      }
    }

With this, I get a shift/reduce conflict but Menhir resolves it the way I'd like. I want the nonempty_list(typec) to be greedy so adjacent TYPE declarations are grouped together. I.e., with Menhir resolving the conflict my generated AST looks something like:

(LetExp
  (decs
    ((TypeDec
      (((type_name (my_type)) (type_ty (NameTy (int))))
       ((type_name (my_type2)) (type_ty (NameTy (string))))
    ))))
  (body (SeqExp ())))

I'd like to get rid of the warning, but I can't figure out to resolve the conflict the same way as Menhir. I've tried using %inline tydec, which does make the warning go away, but the shift of TYPE isn't applied as I would expect. Instead, preference is given to the list in decs, yielding an AST that looks like this:

(LetExp
  (decs
    ((TypeDec
      (((type_name (my_type)) (type_ty (NameTy (int))))))
     (TypeDec
      (((type_name (my_type2)) (type_ty (NameTy (string)))
  )))))
  (body (SeqExp ())))

I've also tried explicitly setting the precedence, but Menhir warns me that it's a useless declaration.

I'm sure I'm missing something fundamental here. Give productions that yield lists of lists, how can I make the inner list greedy?

nirvdrum
  • 2,319
  • 17
  • 26

1 Answers1

1

As far as I remember you cannot precise precedence of one rule over the other (as you can for productions in the same rule with %prec), maybe I'm wrong, but if not I can understand why it's impossible. The idea is that if you in such situation maybe you've made some logical error. I'll try to explain.

Let say we have some language with the following syntax:

vardef  i = 42
        j = 24
typedef all_new_int  = int
        all_new_bool = bool

in this case it's quite logic to define something like this:

decs: l = list(dec) { l }

dec:
  | l = TYPEDEF nonempty_list(tydec) { A.TypeDec l }
  | ...

and in this case because of typedef we don't have any conflicts. Now, if there is no such "separator" but simply:

    var i = 42
    var j = 24
    type all_new_int  = int
    type all_new_bool = bool

Why to try to regroup this two type declarations? It's not a block (as in previous example) but two separate declarations. So the AST must be coherent with language. I know it's not the answer you're looking for, but what I'm trying to say is that you don't need nonempty_list in dec:

decs: l = list(dec) { l }

dec:
  | l = tydec { [A.TypeDec l] }
  | v = vardec { v }
  | l = fundec { [A.FunctionDec l] }

And in this case maybe your dec don't need to return a list. Yes, your AST will be the same as for %inline tydec, but it's coherent with language.

By the way, from menhir documentation:

actual+ is syntactic sugar for nonempty_list(actual)


Edit:

If you don't want to change your structure (for some reason) you can always rewrite your rules, for instance this two grammars are completely the same:

1) With shift/reduce

%token <int> INT
%token NONE
%token EOF

%start <int option list list> main

%%

main: l = op_list* EOF { l }

op_list:
      l = num+ { l }
    | NONE   { [None] }

num: i = INT { Some i }

2) Without shift/reduce

%token <int> INT
%token NONE
%token EOF

%start <int option list list> main

%%

main: ll=main2 EOF { ll }

main2:
    { [] }
    | n=num ll=main2 { match ll with
                       | ((Some i)::l)::ll -> ((Some i)::(Some n)::l)::ll
                       | _ -> [Some n]::ll
                     }
    | NONE ll=main2 { [None]::ll }

num: i=INT { Some i }

Once again, here when I see 0 NONE 1 2 NONE 3 I think about [0; None; 1; 2; None; 3] and not [[0]; [None]; [1; 2; 3]; [None]; 3] but if second solution is more simple for future use then ok. I'm sure you can do this with %prec and the company (%left, %right, ...), but in any case you need to rewrite your rules. When you have conflict you need to resolve it, there is no magic.

6.3 How are severe conflicts resolved in the end? It is unspecified how severe conflicts are resolved. Menhir attempts to mimic ocamlyacc ’s specification, that is, to resolve shift/reduce conflicts in favor of shifting, and to resolve reduce/reduce conflicts in favor of the production that textually appears earliest in the grammar specification. However, this specification is inconsistent in case of three-way conflicts, that is, conflicts that simultaneously involve a shift action and several reduction actions. Furthermore, textual precedence can be undefined when the grammar specification is split over multiple modules. In short, Menhir’s philosophy is that severe conflicts should not be tolerated, so you should not care how they are resolved.

vonaka
  • 913
  • 1
  • 11
  • 23
  • Thanks for taking the time to write up an answer. Unfortunately, the language is already specified. While I could change it, I wouldn't be able to parse the set of test files associated with the book. You're right that it would be a lot easier with a separator, rather than being implied. At the core of it though, I'd like the `tydec` list to be associated with a single `A.TypeDec`. Generating separate `A.TypeDec` instances complicates the semantic analysis considerably. – nirvdrum Aug 12 '17 at 00:02
  • Just another point, too, in case it was lost in my original question. If I allow the warning to stay and let Menhir arbitrarily resolve the conflict, I get my desired behavior. Perhaps it's flawed reasoning, but I assume there's a way for me to direct Menhir to perform the same resolution without the warning. – nirvdrum Aug 12 '17 at 00:12
  • 1
    Thanks for the edit. If `%prec` can't be used and `%right` doesn't do what I want, I guess I don't have too many other options. I'll look at more invasive rule like you've proposed. Not to sound like a broken record, but since Menhir is already resolving the way I wanted, I was hoping I just overlooked something obvious and simpler. – nirvdrum Aug 13 '17 at 13:23