1

Hi I just started working on lex and yacc tools.

I realized that yyerror recieves only the string "syntax error" from yacc. I was wondering if I can customize this string.

Oh and also can I differentiate different types of errors? (tyring to have missing token and additional token as different erros.) If so, how should I..?

Many thanks.

jin
  • 31
  • 2

1 Answers1

2

You're free to print any message you want to in yyerror (or even no message at all), so you can customise messages as you see fit. A common customisation is to add the line number (and possibly column number) of the token which triggered the error. You can certainly change the text if you want to, but if you just want to change it to a different language, you should probably use the gettext mechanism. You'll find .po files in the runtime-po subdirectory of the source distribution. If this facility is enabled, bison will arrange for the string to be translated before it is passed to yyerror, but of course you could do the translation yourself in yyerror if that is more convenient for you.

I suspect that what you actually want is for bison to produce a more informative error message. Bison only has one alternative error message format, which includes a list of "expected" tokens. You can ask Bison to produce such an error message by including

%define parse.error verbose

in your prologue. As the manual indicates, the bison parsing algorithm can sometimes produce an incorrect list of expected tokens (since it was not designed for this particular purpose); you can get a more precise list by enabling lookahead correction by also including

%define parse.lac full

This does have a minor performance penalty. See the linked manual section for details.

The list of tokens produced by this feature uses the name of the token as supplied in the bison file. These names are usually not very user-friendly, so you might find yourself generating error messages such as the infamous PHP error

syntax error, unexpected T_CONSTANT_ENCAPSED_STRING

(Note: more recent PHP versions produce a different but equally mysterious message.)

To avoid this, define double-quoted aliases for your tokens. This can also make your grammar a lot more readable:

%type <string> TOK_ID "identifier"
%token TOK_IF "if" TOK_ELSE "else" TOK_WHILE "while"
%token TOK_LSH "<<"
/* Etc. */

%%

stmt: expr ';' 
    | while 
    | if
    | /* ... */
while: "while" '(' expr ')' stmt
expr: "identifier"
    | expr "<<" expr
/* ... */

The quoted names will not be passed through gettext. That's appropriate for names which are keywords, but it might be desirable to translate descriptive token aliases. A procedure to do so is outline in this answer.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the reply, I was looking more for something like this error: syntax error: addtional token "TOKEN" error: missing token "TOKEN" Those are two different errors and using verbose as you said makes yacc to return the same string for the two different errors. Are there any way to distinguish them? – jin Oct 29 '18 at 16:44
  • @yunho: The only way to do that would be to try inserting token X and, independently, deleting token Y, and retrying the parse to see if it worked. It might happen that neither works; it might happen that both work but neither were the programmer's intent. Anyway, you could do that yourself, but bison won't do it for you; a bison parser reads left to right and does not backtrack. – rici Oct 29 '18 at 17:24
  • @yunho: perhaps you could add some examples to your question of errors which are "clearly" of one type or the other. – rici Oct 29 '18 at 17:41
  • thanks, but backtracking is quite not what I was looking for. well the examples are 1. error: syntax error: additional token "TOKEN" 2. error: missing token "TOKEN" The first one has additional token therefore, error. The second one doesn't have a token and also therefore, an error but of a different type. I don't really see how backtracking can help with this distinguishing the two types of errors. – jin Oct 30 '18 at 07:31
  • oh wait, I didn't see the edited comment. I will try again thanks a lot! – jin Oct 30 '18 at 07:32
  • @yunho: You already mentioned the error messages you want to produce. I was looking for specific examples of erroneous inputs which (unambiguously) trigger those errors. It's notoriously difficult to tell the difference. (Does `a[((2 + 3) * (4 + 5) >> 7]` have too many open parentheses or too few close parentheses? In the latter case, *where* should the close parenthesis go? What syntactic argument can you use to justify the answers?) – rici Oct 30 '18 at 17:25
  • @rici As of Bison 3.2, I don't think the symbols involved in syntax errors are submitted to gettext in any of the skeletons. There's definitely room for improvements in this area, granted. – akim Oct 31 '18 at 07:43
  • @akim: you're right; I misremembered. I'll fix the answer. Oddly, I was remembering a hack I myself concocted five years ago. It took me a while to find it because I'd convinced myself that it was due to Russ Cox. – rici Oct 31 '18 at 14:57
  • @akim:yes, precisely. That article is cited in my 2013 answer (and in other answers) although I now see that the code it is based on is hard to find. – rici Oct 31 '18 at 15:42