8

I'm currently transitioning from Java to Python and have taken on the task of trying to create a calculator that can carry out symbolic operations on infix-notated mathematical expressions (without using custom modules like Sympy). Currently, it's built to accept strings that are space delimited and can only carry out the (, ), +, -, *, and / operators. Unfortunately, I can't figure out the basic algorithm for simplifying symbolic expressions.

For example, given the string '2 * ( ( 9 / 6 ) + 6 * x )', my program should carry out the following steps:

  1. 2 * ( 1.5 + 6 * x )
  2. 3 + 12 * x

But I can't get the program to ignore the x when distributing the 2. In addition, how can I handle 'x * 6 / x' so it returns '6' after simplification?

EDIT: To clarify, by "symbolic" I meant that it will leave letters like "A" and "f" in the output while carrying out the remaining calculations.

EDIT 2: I (mostly) finished the code. I'm posting it here if anyone stumbles on this post in the future, or if any of you were curious.

    def reduceExpr(useArray):

        # Use Python's native eval() to compute if no letters are detected.
        if (not hasLetters(useArray)):
            return [calculate(useArray)] # Different from eval() because it returns string version of result

        # Base case. Returns useArray if the list size is 1 (i.e., it contains one string). 
        if (len(useArray) == 1):
            return useArray

        # Base case. Returns the space-joined elements of useArray as a list with one string.
        if (len(useArray) == 3):
            return [' '.join(useArray)]

        # Checks to see if parentheses are present in the expression & sets.
        # Counts number of parentheses & keeps track of first ( found. 
        parentheses = 0
        leftIdx = -1

        # This try/except block is essentially an if/else block. Since useArray.index('(') triggers a KeyError
        # if it can't find '(' in useArray, the next line is not carried out, and parentheses is not incremented.
        try:
            leftIdx = useArray.index('(')
            parentheses += 1
        except Exception:
            pass

        # If a KeyError was returned, leftIdx = -1 and rightIdx = parentheses = 0.
        rightIdx = leftIdx + 1

        while (parentheses > 0):
            if (useArray[rightIdx] == '('):
                parentheses += 1
            elif (useArray[rightIdx] == ')'):
                parentheses -= 1
            rightIdx += 1

        # Provided parentheses pair isn't empty, runs contents through again; else, removes the parentheses
        if (leftIdx > -1 and rightIdx - leftIdx > 2):
            return reduceExpr(useArray[:leftIdx] + [' '.join(['(',reduceExpr(useArray[leftIdx+1:rightIdx-1])[0],')'])] + useArray[rightIdx:])
        elif (leftIdx > -1):
            return reduceExpr(useArray[:leftIdx] + useArray[rightIdx:])

        # If operator is + or -, hold the first two elements and process the rest of the list first
        if isAddSub(useArray[1]):
            return reduceExpr(useArray[:2] + reduceExpr(useArray[2:]))
        # Else, if operator is * or /, process the first 3 elements first, then the rest of the list
        elif isMultDiv(useArray[1]):
            return reduceExpr(reduceExpr(useArray[:3]) + useArray[3:])
        # Just placed this so the compiler wouldn't complain that the function had no return (since this was called by yet another function).
        return None
Edwin
  • 2,074
  • 1
  • 21
  • 40
  • 8
    I think you're starting to see the virtue of Sympy :-) I think you're looking at building a full-fledged recursive descent parser for arithmetic expressions, followed by manipulation of the data tree to solve for X. – wberry Jul 14 '11 at 20:13
  • @li.davidm It's still in the logic stages right now. I can't figure out how to implement past the first stumbling block. – Edwin Jul 14 '11 at 20:17
  • @wberry Yes, I know it has to be recursive, otherwise I wouldn't be able to handle nested parentheses. That itself has already been implemented. Also, I guess I wasn't clear enough, because I'm supposed to leave x as x, not try to define x. – Edwin Jul 14 '11 at 20:18
  • Your example is wrong. The result should be `3 + 2 * 6 * x = 3 + 12 * x`. Apart from that, I have to wonder: Are you trying to do symbolic math, or are you just trying to evaluate arithmetic expressions while somehow not breaking on variables? –  Jul 14 '11 at 20:20
  • Even starting with a working parser that gives me free variable tokens in a tree structure, it's probably an evening's worth of work to turn that into a closure that returns either an expression (if there are more free variables) or a value given a value. But that is the approach I would take. Good luck! – wberry Jul 14 '11 at 20:23
  • 3
    BTW "x * 6 / x" cannot really be reduced to 6, because it is undefined when x == 0. – wberry Jul 14 '11 at 20:26
  • @delnan Um...I guess I misunderstand the definition of symbolic math. I'm just trying to evaluate the expression without breaking on the variables. And thanks for pointing out the error. – Edwin Jul 14 '11 at 20:28
  • Applying to justin.tv, are you? – Swiss Jul 14 '11 at 20:29
  • @Swiss Yes, but I've been mulling this over for the past 3 hours and couldn't figure it out. I think I know how to do it now. Since @wberry pointed out that I don't have to worry about simplifying x/x, I just need to distribute * and / across ()-bounded expressions, while adding + and - to the ()-bounded expressions. I'm going to give it a try and see if it works – Edwin Jul 14 '11 at 21:05
  • Are you restricted to a finite set of variables (x,y) or can the input contain any previously unknown string? If you know variables, I've got something really simple and easy. – phkahler Jul 14 '11 at 21:17
  • @phkahler It can be any single letter a-z, upper or lowercase. – Edwin Jul 14 '11 at 21:24
  • @wberry: x * 6 / x is defined as 6 as lim x->0. This is verifiable by L'Hospital's rule. – Swiss Jul 15 '11 at 02:15
  • @Swiss I agree that lim [x->0] (x * 6 / x) == 6. But that does not mean the expressions are the same. 0 * 6 / 0 is clearly undefined, and therefore so is x * 6 / x where x == 0. – wberry Jul 15 '11 at 13:53
  • @wberry: No, that is incorrect. – Swiss Aug 01 '11 at 20:00

2 Answers2

4

You need much more processing before you go into operations on symbols. The form you want to get to is a tree of operations with values in the leaf nodes. First you need to do a lexer run on the string to get elements - although if you always have space-separated elements it might be enough to just split the string. Then you need to parse that array of tokens using some grammar you require.

If you need theoretical information about grammars and parsing text, start here: http://en.wikipedia.org/wiki/Parsing If you need something more practical, go to https://github.com/pyparsing/pyparsing (you don't have to use the pyparsing module itself, but their documentation has a lot of interesting info) or http://www.nltk.org/book

From 2 * ( ( 9 / 6 ) + 6 * x ), you need to get to a tree like this:

      *
2           +
         /     *
        9 6   6 x

Then you can visit each node and decide if you want to simplify it. Constant operations will be the simplest ones to eliminate - just compute the result and exchange the "/" node with 1.5 because all children are constants.

There are many strategies to continue, but essentially you need to find a way to go through the tree and modify it until there's nothing left to change.

If you want to print the result then, just walk the tree again and produce an expression which describes it.

viraptor
  • 33,322
  • 10
  • 107
  • 191
  • Oh, so that's what wberry was trying to say. Thanks for clarifying it, and sorry I can't upvote your answer right now. (I will once I get the right though.) – Edwin Jul 15 '11 at 00:17
  • Pyparsing is no longer hosted on wikispaces.com. Go to https://github.com/pyparsing/pyparsing – PaulMcG Aug 27 '18 at 13:16
  • 1
    @PaulMcG you also have a wikispaces link in your SO bio :) – viraptor Aug 27 '18 at 23:55
2

If you are parsing expressions in Python, you might consider Python syntax for the expressions and parse them using the ast module (AST = abstract syntax tree).

The advantages of using Python syntax: you don't have to make a separate language for the purpose, the parser is built in, and so is the evaluator. Disadvantages: there's quite a lot of extra complexity in the parse tree that you don't need (you can avoid some of it by using the built-in NodeVisitor and NodeTransformer classes to do your work).

>>> import ast
>>> a = ast.parse('x**2 + x', mode='eval')
>>> ast.dump(a)
"Expression(body=BinOp(left=BinOp(left=Name(id='x', ctx=Load()), op=Pow(),
right=Num(n=2)), op=Add(), right=Name(id='x', ctx=Load())))"

Here's an example class that walks a Python parse tree and does recursive constant folding (for binary operations), to show you the kind of thing you can do fairly easily.

from ast import *

class FoldConstants(NodeTransformer):
    def visit_BinOp(self, node):
        self.generic_visit(node)
        if isinstance(node.left, Num) and isinstance(node.right, Num):
            expr = copy_location(Expression(node), node)
            value = eval(compile(expr, '<string>', 'eval'))
            return copy_location(Num(value), node)
        else:
            return node

>>> ast.dump(FoldConstants().visit(ast.parse('3**2 - 5 + x', mode='eval')))
"Expression(body=BinOp(left=Num(n=4), op=Add(), right=Name(id='x', ctx=Load())))"
Gareth Rees
  • 64,967
  • 9
  • 133
  • 163