0

Problem:

# From example at https://github.com/lark-parser/lark/blob/master/examples/json_parser.py
from lark import Lark, Transformer, v_args
parse = json_parser.parse
json_grammar = r""" ... """
### Create the JSON parser with Lark, using the LALR algorithm
json_parser = Lark(json_grammar, parser='lalr',
                   # Using the standard lexer isn't required, and isn't usually recommended.
                   # But, it's good enough for JSON, and it's slightly faster.
                   lexer='standard',
                   # Disabling propagate_positions and placeholders slightly improves speed
                   propagate_positions=False,
                   maybe_placeholders=False,
                   # Using an internal transformer is faster and more memory efficient
                   transformer=TreeToJson())

with open(sys.argv[1]) as f:
    tree = parse(f.read())
    print( tree )
    # Errors next 2 lines:
    # No: tree.pretty( indent_str="  " )
    # No: Lark.pretty( indent_str="  " )

Specific Error:

  • AttributeError: type object 'Lark' has no attribute 'pretty'

Setup:

Python version = 3.8.1

In Miniconda 3 on Mac Bug Sur

conda install lark-parser

Installed 0.11.2-pyh44b312d_0

conda upgrade lark-parser

Installed 0.11.3-pyhd8ed1ab_0

Edit: Note about my Goal:

The goal here is NOT just to parse JSON; I just happen to be using a JSON example to try and learn. I want to write my own grammar for some data that I'm dealing with at work.

Edit: Why I Believe Pretty Print Should Exist:

Here's an example that uses the .pretty() function, and even includes output. But I can't seem to find anything (via conda at least) that includes .pretty(): http://github.com/lark-parser/lark/blob/master/docs/json_tutorial.md

Mark Bennett
  • 1,446
  • 2
  • 19
  • 37
  • Why do you expect it to have a `pretty` method? – rici May 11 '21 at 21:31
  • @rici numerous examples on the web, including example I included – Mark Bennett May 11 '21 at 21:58
  • The example you included does not invoke pretty anywhere. It uses a TreeTransformer to turn the value returned from parse into an ordinary Python object (otherwise the assert at the end of test wouldn't work), and ordinary Python objects don't have pretty methods. – rici May 11 '21 at 22:02
  • @rici here's an example that uses .pretty() from Lark: https://github.com/lark-parser/lark/blob/master/docs/json_tutorial.md – Mark Bennett May 12 '21 at 00:42
  • yes, I believe that code will work as expected. But, to be slightly more precise, it uses `Lark.Tree.pretty()`, which is a member function of a class defined in the `Lark` module. There's no `Lark.pretty` defined in that module. – rici May 12 '21 at 03:05
  • Here's where `pretty` is defined, FWIW: https://github.com/lark-parser/lark/blob/master/lark/tree.py#L60 – rici May 12 '21 at 03:08

2 Answers2

2

I am not sure what I can put in this answer that is not already in the other answer. I will just try to create corresponding examples:

json_parser = Lark(json_grammar, parser='lalr',
                   # Using the standard lexer isn't required, and isn't usually recommended.
                   # But, it's good enough for JSON, and it's slightly faster.
                   lexer='standard',
                   # Disabling propagate_positions and placeholders slightly improves speed
                   propagate_positions=False,
                   maybe_placeholders=False,
                   # Using an internal transformer is faster and more memory efficient
                   transformer=TreeToJson()
)

The important line here is the transformer=TreeToJson(). It tells lark to apply the Transformer class TreeToJson before returing the Tree to you. If you remove that line:

json_parser = Lark(json_grammar, parser='lalr',
                   # Using the standard lexer isn't required, and isn't usually recommended.
                   # But, it's good enough for JSON, and it's slightly faster.
                   lexer='standard',
                   # Disabling propagate_positions and placeholders slightly improves speed
                   propagate_positions=False,
                   maybe_placeholders=False,
)

Then you get the Tree instance with the .pretty method:

tree = json_parser.parse(test_json)
print(tree.pretty())

You can then apply the Transformer manually:

res = TreeToJson().transform(tree)

This is now a 'normal' python object, like you would get from the stdlib json module, so probably a dictonary.

The transformer= option of the Lark construct makes it so that this is done before a Tree was ever created, saving time and memory.

MegaIng
  • 7,361
  • 1
  • 22
  • 35
  • Wow! OK, so I think it was the varying return types that messed me up for sure (and when transforms are, or are not, run) Need to digest answer a bit more and revisit my code. – Mark Bennett May 12 '21 at 23:45
1

The JSON parser in the Lark examples directory uses a tree transformer to turn the parsed tree into an ordinary JSON object. That makes it possible to verify that the parse is correct by comparing it with the JSON parser in Python's standard library:

    j = parse(test_json)
    print(j)
    import json
    assert j == json.loads(test_json)

The assert at the end could only pass if the value returned by parse had the same type as the object returned by json.loads, which is an ordinary unadorned Python builtin type, typically dict or array.

You might find the pretty printer in the Python standard library useful for this particular application. Or you could use the builtin JSON.dumps function with a non-zero indent keyword argument. (Eg: print(json.dumps(json_value, indent=2)))

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the out-of-the-box answer, but it's not specifically parsing JSON that I'm trying to accomplish. The bigger goal is to use the Lark system on some custom data to parse it - but was using the JSON example to learn. – Mark Bennett May 12 '21 at 00:41
  • 1
    @mark: what you can learn from the JSON example, specifically, is that a `transformer` function specified in the call to parse allows the parser to return any datatype which is convenient, not just objects of class `Lark.Tree`, which is what the parser returns by default. The `Lark.Tree` class provides a `pretty` member function. But since in the JSON example, a `Tree` object is not being returned, there is no member function by that name. That's not specific to JSON parsing; you can use, or not use, a transformer with any grammar. The code in the tutorial you cite doesn't use one. – rici May 12 '21 at 02:59
  • 1
    @,Mark: My "out of the box" answer starts by mentioning the transformer and then explains one of the motivations for using it, quoting the relevant code. It then suggests some relevant solutions for the issue you reported. I'm honestly not sure what more I could have added, but I'm open to suggestions. – rici May 12 '21 at 03:02
  • Hi @rici see the next answer; I think they're saying something similar to what you said, but somehow reading it twice in different wording maybe helped it sink in. The issue of returned object types not being more prominently displayed in the doc is maybe a problem. – Mark Bennett May 12 '21 at 23:43
  • @MarkBennett: the fact that it starts by saying "I am not sure what I can put in this answer that is not already in the other answer" seems to indicate that it's saying exactly the same thing, not just something similar. But if it was somehow easier to understand, that's cool. The documentation of Lark is not its best feature IMHO, but on the plus side it's short, so reading it twice doesn't take so long :-). Good luck with the project. – rici May 12 '21 at 23:59