First, context:
As a side project, I'm building a computer algebra system in Python that yields the steps it takes to solve an equation.
So far, I've been able to parse algebraic expressions and equations into an expression tree. It's structured something like this (not the actual code—may not be running):
# Other operators and math functions are based off this.
# Numbers and symbols also have their own classes with 'parent' attributes.
class Operator(object):
def __init__(self, *args):
self.children = args
for child in self.children:
child.parent = self
# the parser does something like this:
expr = Add(1, Mult(3, 4), 5)
On top of this, I have a series of functions that operate recursively to simplify expressions. They're not purely functional, but I'm trying to avoid relying on mutability for operations, instead returning a modified copy of the node I'm working with. Each function looks something like this:
def simplify(node):
for index, child in enumerate(node.children):
if isinstance(child, Operator):
node.children[index] = simplify(node)
else:
# perform some operations to simplify numbers and symbols
pass
return node
The challenge comes in the "step by step" part. I'd like for my "simplification" functions to all be nested generators that "yield" the steps it takes to solve something. So basically, every time each function performs an operation, I'd like to be able to do something like this: yield (deepcopy(node), expression, "Combined like terms.")
so that whatever is relying on this library can output something like:
5x + 3*4x + 3
5x + 12x + 3 Simplified product 3*4x into 12x
17x + 3 Combined like terms 5x + 12x = 17x
However, each function only has knowledge about the node
it's operating on, but has no idea what the overall expression
looks like.
So this is my question: What would be the best way of maintaining the "state" of the entire expression tree so that each "step" has knowledge of the entire expression?
Here are the solutions I've come up with:
- Do every operation in place and either use a global variable or an instance variable in a class to store a pointer to the equation. I don't like this because unit testing is tougher, since now I have to set up the class first. You also lose other advantages of a more functional approach.
- Pass through the root of the expression to every function. However, this either means I have to repeat every operation to also update the expression or that I have to rely on mutability.
- Have the top level function 'reconstruct' the expression tree based on each step I yield. For example, if I yield
5x + 4x = 9x
, have the top level function find the (5x + 4x) node and replace it with '9x'. This seems like the best solution, but how best to 'reconstruct' each step?
Two final, related questions: Does any of this make sense? I have a lot of caffeine in my system right now and have no idea if I'm being clear.
Am I worrying too much about mutability? Is this a case of premature optimization?