0

I was wondering if there is a way to construct a parse tree during LL(1) parsing. I've been trying for days, but have been unable to find a solution. This question is similar, but it doesn't provide enough implementation details, and it is for the AST, not the parse tree.

Details:

  • I am using a stack
xilpex
  • 3,097
  • 2
  • 14
  • 45
  • A parse tree is one case of an AST (in which nothing is abstracted away :-) ) and both of the solutions proposed in the linked answer actually produce parse trees. Did you try one of them? (I don't think you can say that an answer with actual working code "doesn't provide enough implementation details". It might not be written in the language you want, but in that case you need to specify which languages are acceptable :-) and provide enough working code to avoid the inevitable "we don't write code for you" snarky comments. – rici May 01 '20 at 20:55
  • (Of course, those snarky comments are not correct. Lots of people provide code on a golden plate here, even though they probably shouldn't since it just encourages bad questions. But that won't usually happen unless the code is really easy to write.) – rici May 01 '20 at 20:58
  • @rici -- Yes, I did try the first one. The problem is it didn't have enough implementation details, like when to go out of the current AST. – xilpex May 01 '20 at 20:58
  • You leave the current AST node when it is complete. The skeleton is the right-hand side of the production, and each time the AST node is encountered you fill in one symbol. So you know when you get to the end. – rici May 01 '20 at 21:01
  • There's an algorithm suggested in [this answer](https://stackoverflow.com/a/54751222/1566221); it starts about in the middle of the answer. – rici May 01 '20 at 21:04
  • @rici -- Thanks! I get the feeling that because of that bottom-up parsers will be much faster because you *know* the children nodes, so then you don't have to include an extra step. Is this correct? – xilpex May 01 '20 at 21:09
  • Yes, I'm not much of a fan of top-down. Although it has is place. – rici May 01 '20 at 21:12
  • Let me phrase that differently. In a recursive descent parser, the grammar is implemented in the control flow of the parser. That's a simple and powerful idea but it has limitations: LL(1) is a severe restriction; control flow can be ad hoc; you need to worry about stack overflow, etc. But it is a simple and quick solution for many common parsing problems. However. If you want to build an AST from a table-driven parser, there is no earthly reason to accept those limitations. Use a mature available parser generator and concentrate on the interesting stuff, which comes after parsing. – rici May 01 '20 at 21:22

1 Answers1

0

I would do it that way:

Examle Grammar:

statement -> number
statement -> '(' statement '+' statement ')'
number-> 'n'

results into:

RULES = [
    ["number"],
    ['(', "statement", '+', "statement", ')'],
    ['n'],
]

perform ll parsing: "((n+n)+n)" -> ll -> [1,1,0,2,0,2,0,2] you will get a list of performed rules

now you can build a tree

def tree(input, ll_output):
    rule = ll_output.pop(0)
    tokens = []
    for item in RULES[rule]:
        if len(item) > 1: tokens.append(tree(input, ll_output))
        else: tokens.append(input.pop(0))
    return tokens

input = ['(','(','n','+','n',')','+','n',')'] # "((n+n)+n)"
ll_output = [1,1,0,2,0,2,0,2]
tree(input, ll_output)
# -> ['(', ['(', [['n']], '+', [['n']], ')'], '+', [['n']], ')']
Hexception
  • 722
  • 10
  • 25