6

I'm trying to parse a little pseudo-code I'm writing and having some trouble getting values for symbols. It parses successfully, but it won't return a value the same as it would with "regular" characters. Here's an example:

>>> from lark import Lark
>>> parser = Lark('operator: "<" | ">" | "=" | ">=" | "<=" | "!="', start="operator")
>>> parsed = parser.parse(">")
>>> parsed
Tree(operator, [])
>>> parsed.data
'operator'
>>> parsed.value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Tree' object has no attribute 'value'

Why wouldn't there be a value? Is there another way to get the exact operator that was used?

Mike Shultz
  • 1,368
  • 1
  • 15
  • 32

2 Answers2

6

Author of Lark here. Mike's answer is accurate, but a better way to get the same result is by using the "!" prefix on the rule:

>>> from lark import Lark
>>> parser = Lark('!operator: "<" | ">" | "=" | ">=" | "<=" | "!="', start="operator")
>>> parser.parse(">")
Tree(operator, [Token(__MORETHAN, '>')])
Erez
  • 1,287
  • 12
  • 18
3

It appears that by default it removes "tokens"(or what it considered 'punctuation' marks. Luckily, there is an option to change that behavior called keep_all_tokens.

Here's an example with that option:

>>> from lark import Lark
>>> parser = Lark('operator: "<" | ">" | "=" | ">=" | "<=" | "!="', start="operator", keep_all_tokens=True)
>>> parsed = parser.parse(">")
>>> parsed
Tree(operator, [Token(__MORETHAN, '>')])
>>> parsed.children[0].value
'>'
Mike Shultz
  • 1,368
  • 1
  • 15
  • 32