If you specify %token-table
, then bison will generate the yytname
table. This table includes all bison symbols, including internal symbols ($end
, $error
and $undefined
), terminals -- named, single-quoted characters and double-quoted strings -- and non-terminals, which include also the generated names for mid-rule actions.
With yytname
visible, it's easy to extract the tokens in a format recognizable by the gettext
package. For example, you could add to your .y
file something like this:
#ifdef MAKE_TOKEN
int main(void) {
puts("#include <libintl.h>");
puts("#include <stdio.h>");
puts("int main() {");
for (const char* const* p = yytname; *p; ++p) {
// See Note 1 below
printf(" printf(\"%%s: %%s\\n\", \"%s\", gettext (\"%s\"));\n", *p, *p);
}
puts("}");
}
#endif
and then add a stanza to your Makefile (making appropriate substitutions for file names):
messages.pot: my_parser.c
$(CC) $(CFLAGS) -DMAKE_TOKEN -o token_lister $<
./token_lister > my_parser.tokens.c
# See Note 2 below
$(CC) -o my_parser.tokens my_parser.tokens.c
xgettext -o $@ my_parser.tokens.c
Once you have the translations, you still need to figure out how to use them, since bison does not offer an interface for inserting translated token names into its generated error messages. Probably the simplest way is to insert the translations directly into yytname
by iterating through that array and substituting each token name with its translation (that would have to be done at parser startup). That presents the annoyance that yytname
is declared const
by the bison skeleton; however, a very simple sed
or awk
invocation can be used to remove the offending const
. [Note 3]
Having said that, it's not at all clear to me that these automatically generated error messages are "user friendly", unless the user is surprisingly familiar with the language's formal grammar. And a user who is familiar with the grammar might well prefer the original token name, in order to find it in the grammar, rather than a non-expert translation which only coincidentally resembles the original concept. Not that I'm pointing fingers at anyone in particular.
You might enjoy this fascinating essay by Russ Cox, about how he implemented actually friendly error messages for Go.
NOTES:
The direct use of the token name in a C string won't work in the case of the tokens whose representation includes a "
or a \
. In particular, any keyword token ("and"
or "<="
) will fail, as will the single character tokens '"'
and '\\'
. These don't show up very often in grammars; if you're substituting internationalized keywords in your scanner, you're very unlikely to use bison's quoted string feature at all.
If you do want to use such tokens, you'll have to output code for the gettext generator which escapes "
and \
characters in the token name.
Actually, it would be better to use several stanzas, but that one is enough to get you going, I think. You probably want to mark some or all of the intermediate results as .INTERMEDIATE
. The generated executable my_parser.tokens
can be used to verify the translations, but that's totally optional, so you might want to remove that line. On the other hand, it does verify that the strings are compilable.
See Russ Cox's gc
(link provided above) for an example. His Makefile modifies the bison output to remove the const
from yytname
, so that the generated parser can substitute his preferred token names for error messages, so you can see the general idea at work.