A notation for empty right-hand sides of rules

Question

When writing a ("theoretical") grammar with a rule with an empty right-hand side, one always use a symbol such as ε (or 1) to make this emptiness explicit:

A → ε | a A

Such a grammar in Yacc and others would then look like

a: | 'a' a

or "worse"

a:       { $$ = new_list(); }
 | a 'a' { $$ = $1; $$->append($1); }
 ;

The fact that in "real world grammars" (Yacc, Bison, etc.) this empty right-hand side part of the rule is not explicitly marked as empty troubles me: it is easy to miss the fact that an rhs is empty, or worse: to forget to insert | and actually use a mid-rule action:

a:       { $$ = new_list(); }
   a 'a' { $$ = $1; $$->append($1); }
 ;

1) I don't know of any tool that provides a means to make empty rhs explicit. Are there any?

Future versions of Bison might support a dedicated symbol, with errors when used in a non-empty rhs, and warnings when a implicitly empty rhs is left.

2) Do people consider this useful?

3) What would be the notation you'd suggest?

Currently, the candidate is $empty:

a: $empty { $$ = new_list(); }
 | a 'a'  { $$ = $1; $$->append($1); }
 ;

EDIT

The chosen syntax is %empty:

a: %empty { $$ = new_list(); }
 | a 'a'  { $$ = $1; $$->append($1); }
 ;

Indeed $empty looks like a pseudo-symbol, such as $accept that Bison generates for the initial rule, or the $@n pseudo-symbols for mid-rule actions, or $eof for, well, end-of-file. But it's definitely not a symbol, it is precisely the absence of symbols.

On the other hand % clearly denotes a directive (some kind of attribute/metadata), like %pred.

So it's a minor difference of syntax, but it's more consistent with the overall syntax. Credit goes to Joel E. Denny.

score 8 · Answer 1 · answered Feb 01 '13 at 20:35

8

I usually just use a comment:

a: /*epsilon*/ { $$ = new_list(); }
 | a 'a'  { $$ = $1; $$->append($1); }
 ;

Works fine with no changes and makes the intent clear....

IMO, this comes under the heading "If it ain't broke, don't fix it"

answered Feb 01 '13 at 20:35

Chris Dodd

119,907
13
134
226

I also use a comment, and I can't recall having made such a mistake. Yet I seen students fail in various ways, I much prefer explicit over implicit (I use a comment because there is no alternative), and I also prefer checks as early as possible (compile time rather than run time). – akim Feb 02 '13 at 06:54
POSIX Yacc does not support %empty [-Wyacc] So I put comments like those mentioned above, but that produces reduce-reduce conflicts. How to resolve and empty and non-empty production reduce-reduce conflict? – amalp12 Mar 05 '23 at 14:50
1

@amalp12: An empty rule is an empty rule, so if you have a reduce/reduce conflict it doesn't matter if you use `%empty` or `/* empty */` or just blank space. If the reduce/reduce conflict is a problem for you, you need to remove it by refactoring the grammar, same as any other reduce/reduce conflict. Generally, this depends on the context(s) in which the rule is used. – Chris Dodd Mar 05 '23 at 20:55
How do i give priority to the production that doesn't give null and only take the null if the other production doesn't exist? – amalp12 Mar 07 '23 at 09:08
Priority of the production (for resolving reduce/reduce conflicts) comes from the order in the source file. So `a : | a 'a' ;` will give priority to the empty rule and `a: a 'a' | ;` will give priority to the non-empty. – Chris Dodd Mar 07 '23 at 20:45

score 3 · Answer 2 · answered Feb 01 '13 at 21:14

3

I'd suggest the following:

Define the declaration:

%empty ID

whose semantics are two-fold:

1) ID may be used as the only non-rule token in a RHS, to indicate that the RHS is an epsilon production; and

2) epsilon productions not marked with ID are to be considered syntax errors.

Thus, with the declaration:

%empty epsilon

epsilon must be used to mark an empty RHS; without any %empty declaration, the status quo holds, where empty RHSs are unmarked (except perhaps with comments).

That would allow users who like to explicitly mark empty RHSs to do so immediately, without having any impact on existing grammar files or users who don't want to explicitly mark empty RHSs in this fashion.

Personally, I'd probably use such a declaration, although to be honest I'm pretty used to using a comment to mark an empty RHS, and I don't believe I've ever accidentally made an empty RHS. So I wouldn't mark it as a priority feature-request, but nor would I object to its implementation.

answered Feb 01 '13 at 21:14

rici

234,347
28
237
341

Using a directive is an interesting idea! But it seems like you believe that it would be an error not to use `$empty`: that's not what I have in mind, it would only be a warning, `-Wempty`, disabled by default. But I prefer to have a unique name for the empty keyword, so that someone reading someone else's grammar wouldn't be mistaken on the nature of this pseudo token. – akim Feb 02 '13 at 06:58
Hi! Well, I do have a completely different opinion about warnings. I use then extensively, in many projects (free software, but also in closed-source software). But I don't plan to enable this warning by default. Yet, IIUC your worries, it would be sane to diagnose missing empty-markers when it was used in a rule, right? That would enforce consistency: either use it totally, or not at all. Makes plenty of sense! – akim Feb 03 '13 at 09:06
@akim: Without the declaration, it is not possible to specify that an empty production should be an error unless there is at least one empty production. So if I have a grammar which should not have any empty productions (a common case), the consistency check won't help me. But I'll leave it at that. As I said, I don't feel this feature is a priority, and I'm inclined to go with "if it ain't broke, don't fix it." – rici Feb 03 '13 at 17:41
Yes, you can request for errors: `-Werror=empty-rule`. Thanks for your feedback anyway. Feel free to express what you think should be a priority! – akim Feb 04 '13 at 08:22

score 2 · Accepted Answer · answered Feb 01 '13 at 14:59

2

I have used epsilon myself, as well as variations on marker for empty productions to which I attached some code in braces.

A reserved symbol in the bison grammar would be useful; I like the proposed $ prefix to avoid collisions with user-named symbols.

answered Feb 01 '13 at 14:59

David Gorsline

4,933
12
31
36

Thanks. We will probably use `%empty` instead (as a pseudo-symbol, not as a directive as suggested by Chris in his answer), as it is really a keyword, not a symbol (it is neither a terminal, like `$end`, nor a nonterminal, like `$accept`). – akim Feb 05 '13 at 08:01

yroeht · Answer 4 · 2013-02-01T16:52:37.960

0

1) Well, there is the obvious

e: a 'b'
a: 'a'
 | empty
empty:

2) Yes, that would be very helpful.

3) The $accept, $end and $undefined symbols are always defined, and reserved for Bison's internal use exclusively (eg., they cannot appear in the grammar). Bison generates $@n for mid-rule actions, but these cannot be used in the user's grammar either.

The only predefined token that the user may use in the grammar, if I am not mistaken, is error. So why are you not suggesting empty for that dedicated symbol? That would have seemed fairly reasonable. Or are you suggesting introducing $error as well?

Have you considered nothing? I might rather that.

edited Feb 01 '13 at 16:52

answered Feb 01 '13 at 15:40

yroeht

121
4

1

Your proposal is not the same as mine: your parser will have more states to reduce your "empty" non-terminal (two reductions in the end, whereas the original grammar would have only one). It's easy to see the difference if you add actions: you have two, I have one. – akim Feb 01 '13 at 16:46

score 0 · Answer 5 · answered Feb 07 '13 at 06:19

Of course, the production is in some sense not really "empty" if it contains an action, since it's hard in Yacc/Bison to remain unaware of the fact that actions are transformed into nullable non-terminals behind the scenes. And if you (or the book) have been saying "epsilon" all semester in class, perhaps "%epsilon" has more verisimilitude than "%empty".

I muse about subsuming this into a more general assertion mechanism:

lines : %assert(epsilon)
      | %assert(on WORD) lines line ;

line : WORD '\n' ;

%assert(nullable(lines))
%assert(!nullable(line))
%assert(WORD in FIRST(lines))
/* etc. */

The idea being to slightly decrease the pain of figuring out exactly what language yacc/bison has actually implemented after all the heuristics have kicked in. The rest would work more or less as you specified, an option to warn of "empty" rules, unless the "empty" rule contained %assert(epsilon).

In terms of priorities, I would think it a much higher priority for bison to report when it provably has created a parser that could not possibly accept the input grammar (e.g., one or more productions can never fire). At least, that ability was not there last I looked, but I have a pretty old bison :-). And does it still fail to explain in English the problem of productions with common left prefixes that differ by embedded actions? Unless it's gotten a lot better, I would think there's a lot of explanatory improvement left to do that would help more than a check for unintentionally empty rules.

It would be interesting to see some data on the most common errors students run into (I guess I would not have picked this one as a contender!). That would be kind of an interesting experiment: hack the student copy of bison so it sends every run into a database, use some software to clean it up and analyze the most common misunderstandings.

Bison can provide you with some implementation details, such as the FIRSTs. Have a look at `bison --trace=help` to get list of possible topics to spy, especially `--trace=sets`. Maybe it could go into the `.output` file instead of staying hidden for maintainers only. — akim, Feb 07 '13 at 20:50

A notation for empty right-hand sides of rules

EDIT

5 Answers5