1

I have the following grammar in a Bisone file:

item
    : "ITEM" t_name t_type v_storage t_prefix t_tag ';'
    ;
t_name
    : [$_A-Za-z][$_A-Z0-9a-z]*
    ;
t_type
    : "BYTE"
    | "WORD"
    | "LONG"
    | "QUAD"
    ;
v_storage
    : %empty
    | "TYPEDEF"
    ;
t_prefix
    : %empty
    | "PREFIX" t_name
    ;
t_tag
    : %empty
    | "TAG" t_name
    ;

When I attempt to parse the following string ITEM foobar BYTE PREFIX str_ TAG S TYPEDEF; I get an unexpected 'TYPEDEF" and it accepts the ";". Is there something I need to do to allow any order to be specified? If so, I'm hoping that there is a simple solution. Otherwise, I'll need to do a little more work.

Cœur
  • 37,241
  • 25
  • 195
  • 267
JonBelanger
  • 150
  • 3
  • 12
  • Is this the same question? https://stackoverflow.com/questions/13786896/how-would-i-declare-a-grammar-rule-for-a-number-of-tags-in-any-order – Barmar Sep 26 '18 at 23:10
  • `t_name : [$_A-Za-z][$_A-Z0-9a-z]*` is not valid bison (or yacc). Please clarify. What tool are you using, exactly? – rici Sep 27 '18 at 01:10
  • I realize that, it is just what that symbol represents. t_name comes from flex and is comprised of alpha-numerics and '$' and '_'. It is not actually in the Bison file. – JonBelanger Sep 27 '18 at 10:37
  • @Barmar: Yes, I missed that one when I did my search. Thanks for pointing it out. – JonBelanger Sep 27 '18 at 10:40
  • Possible duplicate of [How would I declare a grammar rule for a number of tags in any order?](https://stackoverflow.com/questions/13786896/how-would-i-declare-a-grammar-rule-for-a-number-of-tags-in-any-order) – Barmar Sep 27 '18 at 15:35

1 Answers1

1

It is not possible to tell bison (or yacc) that order doesn't matter. Rules are strictly ordered.

So you have two options:

  1. List all possible orders. If you do this, watch out for ambiguities caused by optional productions. You'll actually need to list all orders and subsets. That mounts up exponentially.

  2. Just accept any list of components, as a list. That will accept repeated components so you'll need to catch that in the semantic action if you care.

The second option is almost always the one you want. Implementation is usually trivial, because you will want to store the components somewhere; as long as that somewhere has a unique value (such as NULL) which means "not yet set", then you only need to test that value before setting it. For example rather than the one in the question):

%{
   #include <stdbool>
   enum Type { 
     TYPE_DEFAULT = 0, TYPE_BYTE, TYPE_WORD, TYPE_LONG, TYPE_QUAD
   };
   typedef struct Item Item;
   struct Item {
     const char *name;
     enum Type   type;
     int         storage; /* 0: unset, 1: TYPEDEF */
     const char *prefix;
     const char *tag;
  };
  // ...
  // Relies on the fact that NULL and 0 are converted to boolean 
  // false. Returns true if it's ok to do the set (i.e. thing
  // wasn't set).
  bool check_dup(bool already_set, const char* thing) {
    if (already_set) 
      fprintf(stderr, "Duplicate %s ignored at line %d\n", thing, yylineno);
    return !already_set;
  }
%}

%union {
   const char *str;
   Item  *item;
   // ...
}

%type <item> item item-def
%token <str> NAME STRING

%%
/* Many of the actions below depend on $$ having been set to $1.
 * If you use a template which doesn't provide that guarantee, you
 * will have to add $$ = $1; to some actions.
 */
item: item-def { /* Do whatever is necessary to finalise $1 */ }
item-def
    : "ITEM" NAME
               { $$ = calloc(1, sizeof *$$); $$->name = $2; }
    | item-def "BYTE"
               { if (check_dup($$->type, "type") $$->type = TYPE_BYTE; }
    | item-def "WORD"
               { if (check_dup($$->type, "type") $$->type = TYPE_WORD; }
    | item-def "LONG"
               { if (check_dup($$->type, "type") $$->type = TYPE_LONG; }
    | item-def "QUAD"
               { if (check_dup($$->type, "type") $$->type = TYPE_QUAD; }
    | item-def "TYPEDEF"
               { if (check_dup($$->storage, "storage") $$->storage = 1; }
    | item-def "PREFIX" STRING
               { if (check_dup($$->prefix, "prefix") $$->prefix = $3; }
    | item-def "TAG" STRING
               { if (check_dup($$->tag, "tag") $$->tag = $3; }

You can separate all those item-def productions into something like:

item-def: "ITEM" NAME   { /* ... */ }
        | item-def item-option
item-option: type | storage | prefix | tag

But then in the actions you need to get at the item object, which is not part of the option production. You can do that with a Bison feature which lets you look into the parser stack:

prefix: "PREFIX" STRING { if (check_dup($<item>0->prefix, "prefix")
                            $<item>0->prefix = $2; }

In this context, $0 will refer to whatever came before prefix, which is whatever came before item-option, which is an item-def. See the end of this section in the Bison manual, where it describes this practice as "risky", which it is. It also requires you to explicitly specify the tag, because bison doesn't do the grammar analysis necessary to validate the use of $0, which would identify its type.

rici
  • 234,347
  • 28
  • 237
  • 341
  • That's what I was afraid of. So, what I'll do is have an option list where order is not particularly indicated. – JonBelanger Sep 27 '18 at 10:39
  • @JonBelanger: Added sample code. It's really not that complicated. – rici Sep 27 '18 at 15:23
  • 1
    You suggested exactly what I did. I actually changed the Bison code to better understand what is being parsed and to recognize what is valid, invalid, present or not. Extremely similar to what you suggested above. The added side benefit is that the grammar is far simpler. Thanks for all your help. – JonBelanger Sep 28 '18 at 14:29
  • 1
    @jon: cool. I added the sample code because when I reviewed the answer, I thought it didn't put enough emphasis on why the better solution is not complicated. Possible future readers might benefit from seeing an implementation skeleton. – rici Sep 28 '18 at 14:56