Can yacc be used to generate three address code for Java 1?

Question

I have read that yacc generates bottom up parser for LALR(1) grammars. I have a grammar for Java 1 that can be used for generating three address code and is strictly LALR(1), but the translation scheme I am employing makes it L-attributed. Now I have read that L-attributed LR grammars cannot be translated during bottom up parsing. So, can yacc be used here or not? And if yes, how does yacc get around this problem?

"Java 1"? You mean Java 1.0, released January **1996**? – Andreas Apr 15 '20 at 17:27 — Andreas, Apr 15 '20 at 17:27
@Andreas Yes, you are correct. – Shashank Kumar Apr 15 '20 at 17:28 — Shashank Kumar, Apr 15 '20 at 17:28

rici · Answer 1 · 2020-04-17T18:19:14.143

You're not going to get a good answer unless you ask a specific, detailed question. Here's a vague sketch of an approach.

Synthesized attributes are obviously not a problem for a bottom-up parser, since they are computed in the final reduction action for the corresponding terminal. So the question comes down to "How can a bottom-up parser compute inherited attributes?"

Since the grammar is L-attributed, we know that any inherited attribute is computed from the attributes of its left siblings. Yacc/bison allows actions to be inserted anywhere in a right-hand side, and these "Mid-Rule Actions" (MRAs) are executed as they are encountered. A MRA has available to it precisely its left-siblings, so which is all that is needed to compute an inherited attribute.

However, that doesn't show how the attribute can actually be inherited. A MRA inserted just before a grammar symbol in some rule can certainly be used to partially compute an inherited attribute of that symbol, but an inherited attribute can also use synthesized attributes of the children.

To accomplish that, we need to do two things:

Insert a MRA just before the non-terminal, which gathers together the left-sibling attributes. Since MRAs are also grammar symbols, this MRA will be the last left-sibling, in effect the youngest uncle of the terminal's children. (You don't necessarily need a MRA; you can insert a "marker": a non-terminal whose only production is empty and whose action is the MRA body. But that's not so convenient because the action will have to get at the semantic values of the preceding grammar symbols. Or you could split the production into two pieces, so that both actions are final.)
Access the uncle's attributes in the terminal's reduction action.

Bison/yacc allow the second step by letting you use a non-positibd symbol index to refer to slots in the parser stack. In particular, $0 refers to the symbol immediately preceding the non-terminal in the parent production (what I called the uncle above). Of course, for that to work, you have to ensure that the uncle is the same non-terminal (or at least has the same semantic type) in every production in which the non-terminal appears. This may require adding some markers.

Speaking of semantic values, you may be able to satisfy yourself that all the uncles of a given non-terminal are the same, or at least have the same type. But bison does not do this analysis, so it can't warn you if you get it wrong. Be careful! And as another consequence, you have to tell bison what the type is, so you can't just write $0: you need $<tag>0.

Note:

It is not always possible to handle inherited attributes in an L-attributed LR grammar, because at the moment in which the non-terminal is encountered, the parser may not yet know that the non-terminal will in fact form part of the parse tree. This problem does not occur in LL grammars, because in LL parsing the parser can only predict a non-terminal which is guaranteed to be present in the parse if the rest of the input is valid.

Any LL grammar can be parsed bottom-up, so there is no problem with L-attributed LL grammars. But the bottom-up parser can do better than that; it doesn't require that the full grammar be LL. Only those decision points for non-terminals which are about to be assigned an inherited attribute need to be LL-deterministic.

This restriction is enforced by the technique of placing a MRA or a marker immediately before the non-terminal. In other words, adding a marker (or an MRA) at certain points of an LR grammar might invalidate the LR property. There is a good discussion of this issue in the bison manual, so I won't elaborate on it here, except to observe one detail.

The technique outlined above for propagating inherited attributes uses MRAs (or markers) at strategic points to hold the inherited attributes. These productions must be reduced in order to proceed with the parse, so as noted in the above-mentioned section of the bison manual it may be necessary to rearrange the grammar in order to remove conflicts. In rare cases, this rewriting is not even possible.

However, removing the conflict might still result in a grammar in which an inherited attribute is propagated in case some non-terminal needs the value, without any guarantee that the non-terminal will eventually be reduced. In this case, the inherited attribute will be needlessly computed and then later ignored. But that shouldn't be a problem. Inherent in the concept of attributes is the idea that attributes are functional; in other words, that the computation is free of side-effects.

The absence of side effects means that an attribute grammar parser should be free to evaluate attributes in any order which respects attribute dependency. In particular, this means that you can trivially achieve correct evaluation of attributes by turning the attribute computations into continuations, a technique sometimes referred to as lazy evaluation or "thunking".

But there is always the temptation to use MRAs precisely in order to perform side-effects. One very common such side effect is printing three-address code to the output stream. Another one is mutating persistent data structures such as symbol tables. That's no longer L-attributed parsing, and so the suggestions offered here might not work for such applications.

All this is fine but isn’t there some issue with this way of bottom up parsing when the grammar is LR. The dragon book says: — Shashank Kumar, Apr 17 '20 at 05:58
@ShashankKumar: your quote from the dragon book didn't appear. — rici, Apr 17 '20 at 14:49
It says: "Can we handle every LR grammar and L-attributed SDD bottom-up? We cannot, as the following intuitive argument shows. have a production A —> B C in an LR-grammar, and there is an inherited attribute B.i that depends on inherited attributes of A. When we reduce to B, we still have not seen the input that C generates, so we cannot be sure that we have a body of production A —> B C. Thus, we cannot compute B.i yet, since we are unsure whether to use the rule associated with this production." — Shashank Kumar, Apr 17 '20 at 14:56
@ShashankKumar: right, that's correct. A bottom-up parser can handle any L-attributed *LL* grammar, but there are particular cases of inherited attributes in LR grammars which are difficult. But it often works. I'll add a note to my answer. — rici, Apr 17 '20 at 15:06
@ShashankKumar: anyway, the theoretical discussion isn't much help in a single concrete exercise. Can bison handle your grammar? Almost certainly, since bison has been used for decades in similar applications. So you should give it a try and if you encounter a specific problem you cannot resolve, ask here a detailed, specific question. Those are much easier to answer. — rici, Apr 17 '20 at 15:11
Ok, how do you determine which LR grammars can be parsed bottom-up? I mean, how to identify the attributes that are hard? I think I would figure it out myself if I think enough. But, it would be great if you help me. For instance, if you have a commonly used grammar for boolean expressions where inherited attributes B.true and B.false store the true and false exits of a boolean expression, will bottom-up parsing work. — Shashank Kumar, Apr 17 '20 at 15:14
@ShashankKumar: it will work if you're allowed to use symbolic labels. Otherwise, you need to backpatch (which is a really a strategy for resolving symbolic labels). But without thinking about it too much, I suspect that if you had to supply the precise target location of the exits, then it's not L-attributed. — rici, Apr 17 '20 at 15:19

Can yacc be used to generate three address code for Java 1?

1 Answers1