Set a rule based on the value of a global variable

Question

In my lexer & parser by ocamllex and ocamlyacc, I have a .mly as follows:

%{
  open Params
  open Syntax
%}

main:
| expr EOF { $1 }

expr:
| INTEGER { EE_integer $1 }
| LBRACKET expr_separators RBRACKET { EE_brackets (List.rev $2) }

expr_separators:
  /* empty */  { [] }
| expr         { [$1] }
| expr_separators ...... expr_separators { $3 :: $1 }

In params.ml, a variable separator is defined. Its value is either ; or , and set by the upstream system.

In the .mly, I want the rule of expr_separators to be defined based on the value of Params.separator. For example, when params.separtoris ;, only [1;2;3] is considered as expr, whereas [1,2,3] is not. When params.separtoris ,, only [1,2,3] is considered as expr, whereas [1;2;3] is not.

Does anyone know how to amend the lexer and parser to realize this?

PS:

The value of Params.separator is set before the parsing, it will not change during the parsing.

At the moment, in the lexer, , returns a token COMMA and ; returns SEMICOLON. In the parser, there are other rules where COMMA or SEMICOLON are involved.

I just want to set a rule expr_separators such that it considers ; and ignores , (which may be parsed by other rules), when Params.separator is ;; and it considers , and ignore ; (which may be parsed by other rules), when Params.separator is ,.

IMHO table-driven parsers aren't amenable to this kind of thing. There is a fixed grammar encoded into their tables. The tables are based on a static analysis of the productions and their constituent tokens, and won't work if the tokens change identities dynamically. You could handle this in your lexer if the comma and semicolon don't show up elsewhere in the grammar. You could consider allowing both comma and semicolon all the time, but verifying after the parse that the correct separator has been used. — Jeffrey Scofield, Aug 22 '20 at 00:33
What are you going to do if the wrong separator is used? Raise a syntax error? Parse it in some other idiosyncratic way (like recognizing the non-separator as an identifier)? Or just like the parse fail? Unless the answer is something like option 2, there is really no advantage to trying to restrict the parse, when you can easily do the check in the reduction action. — rici, Aug 22 '20 at 00:39

rici · Answer 1 · 2020-08-22T04:44:41.370

In some ways, this request is essentially the same as asking a macro preprocessor to alter its substitution at runtime, or a compiler to alter the type of a variable. As with the program itself, once the grammar has been compiled (whether into executable code or a parsing table), it's not possible to go back and modify it. At least, that's the case for most LR(k) parser generators, which produce deterministic parsers.

Moreover, it seems unlikely that the only difference the configuration parameter makes is the selection of a single separator token. If the non-selected separator token "may be parsed by other rules", then it may be parsed by those other rules when it is the selected separator token, unless the configuration setting also causes those other rules to be suppressed. So at a minimum, it seems like you'd be looking at something like:

expr : general_expr
expr_list : expr
%if separator is comma
expr : expr_using_semicolon
expr_list : expr_list ',' expr
%else
expr : expr_using_comma
expr_list : expr_list ';' expr
%endif

Without a more specific idea of what you're trying to achieve, the best suggestion I can provide is that you write two grammars and select which one to use at runtime, based on the configuration setting. Presumably the two grammars will be mostly similar, so you can probably use your own custom-written preprocessor to generate both of them from the same input text, which might look a bit like the above example. (You can use m4, which is a general-purpose macro processor, but you might feel the learning curve is too steep for such a simple application.)

Parser generators which produce general parsers have an easier time with run-time dynamic modifications; many such parser generators have mechanisms which can do that (although they are not necessarily efficient mechanisms). For example, the Bison tool can produce GLR parsers, in which case you can select or deselect specific rules using a predicate action. The OCAML GLR generator Dypgen allows sets of rules to be dynamically added to the grammar during the parse. (I've never used dypgen, but I keep on meaning to try it; it looks interesting.) And there are many others.

Having played around with dynamic parsing features in some GLR parsers, I can only say that my personal experience has been a bit mixed. Modifying grammars at run-time is a brittle technique; grammars tend not to be very easy to split into independent pieces, so modifying a grammar rule can have unexpected consequences in places you don't expect to be affected. You don't always know exactly what language your parsing accepts, because the dynamic modifications can be hard to predict. And so on. My suggest, if you try this technique, is to start with the simplest modification possible and put a lot more effort into grammar tests (which is always a good idea, anyway).

@SoftTimur: yes, although as I said they could both be generated from the same source grammar. — rici, Aug 22 '20 at 03:21

Set a rule based on the value of a global variable

1 Answers1