Multiplication by juxtaposition in yacc

Question

I'm trying to implement a grammar that allows multiplication by juxtaposition. This is for parsing polynomial inputs for a CAS.

It works quite well, except few edge cases, as far as I'm aware of. There are two problems I have identified:

Conflict with other rules, e.g., a^2 b is (erroneously) parsed as (^ a (* 2 b)), not as (* (^ a 2) b).
yacc(bison) reports 28 shift/reduce conflicts and 8 reduce/reduce conflicts.

I'm pretty sure properly resolving the first issue will resolve the second as well, but so far I haven't been successful.

The following is the gist of the grammar that I'm working with:

%start  prgm
%union {
    double  num;
    char    *var;
    ASTNode *node;
}
%token  <num>   NUM
%token  <var>   VAR
%type   <node>  expr

%left   '+' '-'
%left   '*' '/'
%right  '^'
%%
prgm:     // nothing
    | prgm '\n'
    | prgm expr '\n'
    ;
expr:     NUM
    | VAR
    | expr '+' expr
    | expr '-' expr
    | expr '*' expr
    | expr '/' expr
    | expr '^' expr
    | expr expr %prec '*'
    | '-' expr
    | '(' expr ')'
    ;
%%

Removing the rule for juxtaposition (expr expr %prec '*') resolves the shift/reduce & reduce/reduce warnings.

Note that ab in my grammar should mean (* a b). Multi-character variables should be preceded by a quote('); this is already handled fine in the lex file. The lexer ignores spaces( ) and tabs(\t) entirely.

I'm aware of this question, but the use of juxtaposition here does not seem to indicate multiplication.

Any comments or help would be greatly appreciated!

P.S. If it helps, this is the link to the entire project.

The semantics of the juxtaposed operator have absolutely nothing to do with the syntax. The semantics don't even form part of the grammar. So the linked question is an exact duplicate, as far as I can see. (Anyway, afaics the semantics are multiplication there, too.) — rici, Mar 27 '21 at 14:48
@rici Thank you for the comment. I tried to apply the solution given in the linked question by replacing all of my `expr` to `expr_sequence` and added a new non-terminal `expr_sequence: expr | expr_sequence expr` yet to fail. Can you please elaborate more on where to replace my `expr`s with `expr_sequence`? — Jay Lee, Mar 27 '21 at 14:59

rici · Accepted Answer · 2021-03-28T02:45:56.867

As indicated in the answer to the question you linked, it is hard to specify the operator precedence of juxtaposition because there is no operator to shift. (As in your code, you can specify the precedence of the production expr: expr expr. But what lookahead token will this reduction be compared with? Adding every token in FIRST(expr) to your precedence declarations is not very scalable, and might lead to unwanted precedence resolutions.

An additional problem with the precedence solution is the behaviour of the unary minus operator (an issue not addressed in the linked question), because as written your grammar allows a - b to be parsed either as a subtraction or as the juxtaposed multiplication of a and -b. (And note that - is in FIRST(expr), leading to one of the possibly unwanted resolutions I referred to above.)

So the best solutions, as recommended in the linked question, is to use a grammar with explicit precedence, such as the following: (Here, I used juxt as the name of the non-terminal, rather than expr_sequence):

%start  prgm
%token  NUM
%token  VAR

%left   '+' '-'
%left   '*' '/'
%right  '^'
%%
prgm:     // nothing
    | prgm '\n'
    | prgm expr '\n'
expr: juxt
    | '-' juxt
    | expr '+' expr
    | expr '-' expr
    | expr '*' expr
    | expr '/' expr
    | expr '^' expr
juxt: atom
    | juxt atom
atom: NUM
    | VAR
    | '(' expr ')'

This grammar may not be what you want:

it's rather simple-minded handling of unary minus has a couple of issues. I don't think it's problematic that it parses -xy into -(xy) instead of (-x)y, but it's not ideal. Also, it doesn't allow --x (also, probably not a problem but not ideal). Finally, it does not parse -x^y as -(x^y), but as (-x)^y, which is contrary to frequent practice.
In addition, it incorrectly binds juxtaposition too tightly. You might or might not consider it a problem that a/xy parses as a/(xy), but you would probably object to 2x^7 being parsed as (2x)^7.

The simplest way to avoid those issues is to use a grammar in which operator precedence is uniformly implemented with unambiguous grammar rules.

Here's an example which implements standard precedence rules (exponentiation takes precedence over unary minus; juxtaposing multiply has the same precedence as explicit multiply). It's worth taking a few minutes to look closely at which non-terminal appears in which production, and think about how that correlates with the desired precedence rules.

%union {
    double  num;
    char    *var;
    ASTNode *node;
}
%token  <num>   NUM
%token  <var>   VAR
%type   <node>  expr mult neg expt atom

%%
prgm:     // nothing
    | prgm '\n'
    | prgm error '\n'
    | prgm expr '\n'
expr: mult
    | expr '+' mult
    | expr '-' mult
mult: neg
    | mult '*' neg
    | mult '/' neg
    | mult expt
neg : expt
    | '-' neg
expt: atom
    | atom '^' neg
atom: NUM
    | VAR
    | '(' expr ')'

Thank you for the comprehensive answer! The provided grammar indeed resolves the second issue. As you mentioned in the post, the juxtaposition does bind tightly, and comprehends `a^2 b` as `a^(2b)`, which wasn't the one I desired. Is there a way to remedy this situation, or is it a limitation of yacc/LALR parser? — Jay Lee, Mar 28 '21 at 00:11
@JayLee: No, you can get bison to parse pretty well anything, as long as its unambiguous. But `expr: expr '+' expr` is *not* unambiguous, and you're relying on precedence declarations to disambiguate. That works until your grammar doesn't fit the precedence model. Juxtaposing multiplication is such a point. At that point, you need to write an unambiguous grammar. — rici, Mar 28 '21 at 02:38
Not only I could implement the grammar as I intended it to work, your answer helped me a lot understand how yacc works through the grammar. I now have a better grasp of constructing an unambiguous grammar in terms of explicit precedence declarations & introducing new non-terminals. Thank you again for the help! — Jay Lee, Mar 28 '21 at 10:00
Thank you. Thank you. I've been hunting around for the last our, and this is exactly what I've been looking for. — Frank Yellin, Apr 08 '22 at 21:02
Hmm. How difficult would it be to change this grammar so that ab/cd parsed as (ab)/(cd). In my mind, implicit multiplication binds slightly more tightly than explicit multiplication. — Frank Yellin, Apr 09 '22 at 03:55
@Frank: It's easy, if that's what you want. You just need to make implicit multiplication bind slightly more tightly :-). (Just follow the cascade pattern.) — rici, Apr 09 '22 at 04:08
@Rici. Unfortunately not. You have to make sure that "a b - c" parses the way you expect and not as a * b * (-c). The juxtaposition case in the above code uses "expt" for the right side rather than "neg". I'm not sure how to imitate that when juxtaposition *can* have multiplication as one of its arguments, but not negation on the right. — Frank Yellin, Apr 09 '22 at 04:14
@Frank: in `mult` change all the `neg` to `juxt` and drop the fourth rule. Add the non-terminal `juxt` after `mult`, with productions `juxt: neg` and `juxt: juxt expt`. (Juxtaposition cannot have explicit multiplication as one of its arguments. It can only have juxtaposition as an argument.) — rici, Apr 09 '22 at 05:09
@rici. Thanks! It's been about 40 years since I've dealt with CFGs and LALR(1) parsers, so I'm a little bit rusty at how to make them unambiguous. — Frank Yellin, Apr 09 '22 at 16:52

Multiplication by juxtaposition in yacc

1 Answers1