0

I'm new to Bison and I've been trying to create an array and rules for concatenation and more for a long time, and can't figure out a why I get a shift reduce here and how can I resolve it:

arr:
  T_OPEN expr       {$$ = (void *)(new vector<int>());((vector<int>*)$$)->push_back($2);}
| arr ',' expr      {((vector<int>*)($1))->push_back($3);}
| arr T_CLOSE       {}
| arr '@' arr       {/*will add it later*/}

T_OPEN is "[" and T_CLOSE is "]". @ is supposed to concatenate the two arrays. arr is of type void*. The part :

arr '@' arr

causes the shift reduce conflict. Any solution would be much appreciated

Shirli
  • 95
  • 13

1 Answers1

4

Productions like

arr: arr '@' arr;

will always produce shift-reduce conflicts because they are ambiguous. Suppose you had two @ operators in source:

...1 @ ...2 @ ...3

Is this supposed to be parsed as:

arr1: ...1 @ ...2
arr2: ...3
arr3: arr1 @ arr2

or

arr1: ...1
arr2: ...2 @ ...3
arr3: arr1 @ arr2

In other words, does @ associate to the left or to the right? The grammar doesn't specify, and so it is ambiguous.

A common way to solve that is with precedence declarations (see the bash manual) but it can be written directly in the grammar (see below).

However, even leaving the @ operator aside, your grammar does not really do what you want. To start with, you probably want basic array literals to be according to this grammar: [Note 1]

arr: '[' expr_list ']'    { $$ = $2; }
   | '[' ']'              { $$ = new std::vector<int>; }
expr_list
   : expr                 { $$ = new std::vector<int>; $$->push_back($1); }
   | expr_list ',' expr   { $1->push_back($3); }

And then you can define concatenation expressions:

arr_concat
   : arr
   | arr_concat '@' arr   { std::copy($3->begin(), $3->end,
                                      std::back_inserter(*$1));
                            delete $3; // Note 2
                          }

Note that the above production is explicit about the associativity of @. No ambiguity is possible.


Notes:

  1. Here I assume that you have declared a semantic union one of whose types is std::vector<int>*, because all those void* casts are ugly and unsafe. It would be something like:

    %union {
       std::vector<int>*   array_pointer;
       // ...
    }
    %type <array_pointer> arr expr_list arr_concat
    %%
    
  2. delete is necessary to avoid a memory leak, but where you do it depends on your approach to memory management. Unfortunately, once you decide to keep pointers to vectors rather than actually vectors (and you have little choice in this case, since copying vectors with each reduction would be ridiculous), you then become responsible for memory management. Deleting semantic values as soon as they have been incorporated into other semantic values is a simple solution, but it means you must be careful about other copies of the pointer to the allocated object. If you are careful to only ever have one pointer to each allocated object, the immediate delete will do the trick; otherwise, you'll need some kind of garbage collection, possibly based on reference-counting.

rici
  • 234,347
  • 28
  • 237
  • 341
  • thank you very much. But, firstly I did write in the precedence the following rule: %left '@' and it still didn't work. Also, the second way you offered is really problematic in my program, because of the timing in which I have to initialize the vectors. I just can't figure out how to do it in the way you offered, and I've tried it before.. – Shirli Nov 29 '15 at 23:33
  • @shirli: your `%left` declaration probably didn't help because of the conflict with `,`. That conflict is an error, unless I misunderstand what you mean by concatenation. Is the desired syntax not `[2,4]@[7]`? I don't at all get you issue with "timing". – rici Nov 30 '15 at 04:08
  • @shirli: I added semantic rules, based on my assumption about what you mean the syntax and semantics of `@` to be. The `expr_list` actions are not that different from your `arr` actions, aside from removing all the ugly void* casts. I don't see any "timing" issue. – rici Nov 30 '15 at 04:56
  • thank you. You were right about what you wrote about the timing- you've shown me a way to make what bothered me about the timing work. But still your solution is problematic for what I need because: 1. arr_concat is supposed to be arr as well- I'd like to use it like any other array of type arr for other functions I need to implement.. 2. for instance if I want to add: | '(' arr_concat ')' {$$ = $2;} to arr_concat, this: ([1]@[2])@[3] will work. But, on the contrary: [1]@([2]@[3]) won't. Adding: " | arr '@' arr_concat " to arr_concat will result in shift-reduce conflict.. – Shirli Nov 30 '15 at 11:36
  • also, why would there be any conflict with ','? – Shirli Nov 30 '15 at 11:40
  • @shirli: If you want to parenthesize array concatenation, you'd need to add `'(' arr_concat ')'` to `arr`, not to `arr_concat`. Look at any standard expression grammar for other examples. The names of non-terminals are, of course, arbitrary. You could use `arr_term` and `arr`, respectively, instead of `arr` and `arr_concat`, if you want the topmost non-terminal to be called `arr`. – rici Nov 30 '15 at 15:23
  • The conflict with `,` in the original grammar comes from the ambiguity with `...1 @ ...2 , expr`. Here, either `...1 @ ...2` or `...2 , expr` can be reduced to `arr`; in both cases, the result can also be reduced to `arr`, and the grammar does not let you decide which interpretation is correct. There are other, more important problems with your original formulation; in particular, using the first and then repeatedly using the third production for `arr`, your grammar would accept `[expr]]]]]]]`. – rici Nov 30 '15 at 15:28