0

The below TatSu grammar (TatSu 5.8.3, Python 3.11) creates an unexpected output from the given input: I expected a nested xxx yy, but the brackets [] are completely ignorded:

@@grammar :: Test
@@whitespace :: /[\t ]+/

start = script ;
script = commands $ ;
commands = { commands:command terminator }* [ commands:command ] ;
command = { command:word }* ; 

word = (lbrack command rbrack) | text ;

text = /[A-Za-z0-9]+/ ;

terminator = (colon | newline) ;
lbrack = '[' ;
rbrack = ']' ;
colon = ';' ;
newline = '\n' ;
any = /./ ;


Input:
set a [xxx yy]; get b uu 79
set c 45

Output:
{
  "commands": [
    {
      "command": [
        "set",
        "a",
        "xxx",
        "yy"
      ]
    },
    {
      "command": [
        "get",
        "b",
        "uu",
        "79"
      ]
    },
    {
      "command": [
        "set",
        "c",
        "45"
      ]
    }
  ]
}

If I use () instead of [], the output looks as expected with xxx yy nested:

Output:
{
  "commands": [
    {
      "command": [
        "set",
        "a",
        [
          "(",
          {
            "command": [
              "xxx",
              "yy"
            ]
          },
          ")"
        ]
      ]
    },
    {
      "command": [
        "get",
        "b",
        "uu",
        "79"
      ]
    },
    {
      "command": [
        "set",
        "c",
        "45"
      ]
    }
  ]
}

This is the Python script to reproduce:

import json
import tatsu
from tatsu.util import asjson

grammar = """
@@grammar :: Test
@@whitespace :: /[\\t ]+/

start = script ;
script = commands $ ;
commands = { commands:command terminator }* [ commands:command ] ;
command = { command:word }* ; 

word = (lbrack command rbrack) | text ;

text = /[A-Za-z0-9]+/ ;

terminator = (colon | newline) ;
lbrack = '[' ;
rbrack = ']' ;
colon = ';' ;
newline = '\\n' ;
any = /./ ;
"""

input = "set a [xxx yy]; get b uu 79\nset c 45"


if __name__ == '__main__':
  print(f"TatSu Version: {tatsu._version.__version__}\n")
  print(f"Grammar:\n{grammar}\n")
  print(f"Input:\n{input}\n")

  parser = tatsu.compile(grammar)
  ast = parser.parse(input)

  print(f"Output:\n{json.dumps(asjson(ast), indent=2)}")

  exit()

What am I doing wrong?

Painter
  • 1
  • 1
  • You can use the `trace` option to look at what the grammar is actually parsing. I don't think the grammar represents the language you want to parse. – Apalala Apr 14 '23 at 14:05

0 Answers0