The below TatSu grammar (TatSu 5.8.3, Python 3.11) creates an unexpected output from the given input: I expected a nested xxx yy, but the brackets [] are completely ignorded:
@@grammar :: Test
@@whitespace :: /[\t ]+/
start = script ;
script = commands $ ;
commands = { commands:command terminator }* [ commands:command ] ;
command = { command:word }* ;
word = (lbrack command rbrack) | text ;
text = /[A-Za-z0-9]+/ ;
terminator = (colon | newline) ;
lbrack = '[' ;
rbrack = ']' ;
colon = ';' ;
newline = '\n' ;
any = /./ ;
Input:
set a [xxx yy]; get b uu 79
set c 45
Output:
{
"commands": [
{
"command": [
"set",
"a",
"xxx",
"yy"
]
},
{
"command": [
"get",
"b",
"uu",
"79"
]
},
{
"command": [
"set",
"c",
"45"
]
}
]
}
If I use () instead of [], the output looks as expected with xxx yy nested:
Output:
{
"commands": [
{
"command": [
"set",
"a",
[
"(",
{
"command": [
"xxx",
"yy"
]
},
")"
]
]
},
{
"command": [
"get",
"b",
"uu",
"79"
]
},
{
"command": [
"set",
"c",
"45"
]
}
]
}
This is the Python script to reproduce:
import json
import tatsu
from tatsu.util import asjson
grammar = """
@@grammar :: Test
@@whitespace :: /[\\t ]+/
start = script ;
script = commands $ ;
commands = { commands:command terminator }* [ commands:command ] ;
command = { command:word }* ;
word = (lbrack command rbrack) | text ;
text = /[A-Za-z0-9]+/ ;
terminator = (colon | newline) ;
lbrack = '[' ;
rbrack = ']' ;
colon = ';' ;
newline = '\\n' ;
any = /./ ;
"""
input = "set a [xxx yy]; get b uu 79\nset c 45"
if __name__ == '__main__':
print(f"TatSu Version: {tatsu._version.__version__}\n")
print(f"Grammar:\n{grammar}\n")
print(f"Input:\n{input}\n")
parser = tatsu.compile(grammar)
ast = parser.parse(input)
print(f"Output:\n{json.dumps(asjson(ast), indent=2)}")
exit()
What am I doing wrong?