6

I'm doing a mathematical expressions parser that parses text into an abstract syntax tree (and I don't know much about doing so).

I've read on Wikipedia that one can use the Shunting-yard algorithm to parse a linear sequence of tokens into Reverse Polish notation or into an AST on itself, but I was not able to find any examples of direct infix-to-AST parsing with Shunting-yard.

Right now I'm using Shunting-yard to convert from infix to postfix notation and then using such output to build an AST.

Is it good practice to convert the expression to postfix notation and then build an AST from it or am I being a bit clumsy?

Gark Garcia
  • 450
  • 6
  • 14
  • 2
    It doesn't sound right, to me, you should be able to create the ast in one fell swoop. Have you written down some examples? Conceptually parsing an expression to an ast is simple, you just find an operator (let's say binary) and that becomes a parent of the two expression either side – Countingstuff Dec 18 '18 at 19:02

1 Answers1

7

To make the shunting yard directly produce an AST the output should be changed to a stack of nodes.

When a number, variable or other terminal is encountered in the input, this is converted to a leaf node, and pushed to the output stack. When an operator is encountered its pushed on to the operator stack as normal.

The biggest change is what happens when an operator is popped off the operator stack. If its a binary operator then last two nodes on the output stack are popped off, an new binary node is constructed with these nodes as children and pushed back on the output stack.

In psudo code

Stack<Node> output
Stack<Operator> operators

function popOperator
    Operator op = operators.pop()
    Node right = output.pop()
    Node left = output.pop()
    Node n = makeNode( op, left, right )
    output.push(n)
Salix alba
  • 7,536
  • 2
  • 32
  • 38
  • How could I deal with parenthesis in this case? – Gark Garcia Dec 25 '18 at 23:20
  • 1
    For parenthesis you don't have a specific operator in the AST, instead the structure of the tree determines evaluation order, which always happens depth first. The code is basically unchanged. When a left-bracket is encountered in the input push it onto the operator stack. When a right bracket is encountered, keep popping operators using the above method, until the matching left bracket is encountered. Just pop this but do nothing to the output tree. – Salix alba Dec 26 '18 at 10:10