1

I have defined a very simple grammar, but tatsu does not behave as expected.

I have added a "start" rule and terminated it with a "$" character, but I still see the same behavior.

If I define the "fingering" rule with a regular expression (digit = /[1-5x]/) instead of the individual terminal symbols, the problem disappears. But shouldn't the old-school BNF-like syntax below work?

from pprint import pprint
from tatsu import parse

GRAMMAR = """
@@grammar :: test
@@nameguard :: False

start = sequence $ ;
sequence = {digit}+ ;
digit = 'x' | '1' | '2' | '3' | '4' | '5' ;"""

test = "23"
ast = parse(GRAMMAR, test)
pprint(ast)  # Prints ['2', '3']

test = "xx"
ast = parse(GRAMMAR, test)
pprint(ast)  # Throws tatsu.exceptions.FailedParse: (1:1) no available options :

The "xx" test should produce "['x', 'x']" and not throw an exception.

What am I missing?

2 Answers2

1

You probably need to check interactions with @@nameguard, which is turned on by default.

For the first version of the grammar, use:

@@nameguard :: False

You can also consider the definitions of @@whitespace and @@namechars that best suite the language and grammar.

Apalala
  • 9,017
  • 3
  • 30
  • 48
  • @@nameguard is set to false in the grammar. I am not sure what whitespace could have to do with this problem. – David Randolph Apr 21 '19 at 00:01
  • Also noteworthy: the issue is apparently only with alphabetic characters. Numerics and other symbols seem to work as expected. – David Randolph Apr 21 '19 at 00:42
  • 1
    @@nameguard only applies to tokens with a leading alphabetic. – Apalala Apr 21 '19 at 01:18
  • I don't understand. "x" is alphabetic. With @@nameguard turned off, as it is, the parser should know it does not need to worry about a name like "xray" or "xx" down the line. It should know that "xx" contains two terminals. Also, the documentation says @@nameguard affects **alphanumerics**. I am only dealing with digits and letters here. What should I set @@namechars to? I am only using spaces in the grammar, so the default @@whitespace should be sufficient. What specifically do I need to do so "xx" parses to ['x', 'x']? It really seems like @@nameguard is being ignored. – David Randolph Apr 21 '19 at 01:53
1

Okay, I think there is a problem with @@nameguard. See https://github.com/neogeny/TatSu/issues/95. The easy workaround for the time being is to use a pattern expression in lieu of individual alphabetic terminals. Also, when @@nameguard is fixed, the documentation should clarify that it only relates to alphanumerics that begin with an alphabetic. Clearly, we did not need @@nameguard for the numeric terminals here.