7

I am trying to figure out if there is a good way to figure out the depth of a particular C# Expression Tree using an iterative approach. We use expressions for some dynamic evaluation and under rare (error) conditions, the system can try to process an Expression Tree that is so large that it blows out the stack. I'm trying to figure out a way to check the depth of the tree prior to allowing the tree to be evaluated.

AJ Henderson
  • 1,120
  • 1
  • 12
  • 32
  • Are you sure that you don't have infinite recursion there? Stack is pretty large thing. This post suggest that you can make up to 18.000 recursive calls (http://stackoverflow.com/questions/4513438/c-sharp-recursion-depth-how-deep-can-you-go) – alex Mar 29 '13 at 18:21
  • @alex - positive. We calculated out the exact depth at which we were seeing the problem and were able to prove that if we simplified the expression tree to that depth it would run ok, but increasing it by 1 caused a problem. For our situation, it was an expression depth of 517 with 3 stack frames per recursion in the expression tree parsing. – AJ Henderson Mar 29 '13 at 18:24
  • 1
    @alex The number of recursive calls you can make is a product of the size of your stack (the default of which can be changed and how much you have at any given point depends on previous code execution) as well as the size of the data you're putting on the stack, which is directly affected by the number and size of parameters in your function calls. So 18,000 for a very specific case. – Pete Mar 29 '13 at 18:26
  • What kind of tree are you working with? Just to get an idea. And what have you tried so far? I dont see any solution which doesnt include recursion. – Lars Udengaard Mar 29 '13 at 18:36
  • @LarsUdengaard It's a C# [ExpressionTree](http://msdn.microsoft.com/en-us/library/bb397951.aspx). I think there may not be an answer to this, but figured I should check to see if anyone knows of a way to walk them iteratively. For a normal tree, it would just be a matter of adding nodes needing processing to a list, opening each node, adding it's nodes to the collection that needs to be processed and then removing it from the list. Also, it is necessary to track the depth of each of those nodes in the list. I don't see a way to do that with ExpressionTree though. – AJ Henderson Mar 29 '13 at 18:38
  • Yes, i understand im just trying crasp the full problem. :-) What is the tree describing? Which depth-sizes are you having problems with? – Lars Udengaard Mar 29 '13 at 18:45
  • @LarsUdengaard - it's a filter being built up for a LINQ expression. Someone fed in a list of things to add checks for that was 1000 long and it added 1000 OR expressions. This made it run recursively down the tree and blew up when it reached the 518th node of the tree (because it filled the entire stack.) – AJ Henderson Mar 29 '13 at 18:47
  • It is of course *possible* to visit an expression tree iteratively rather than recursively but it is not trivial to do so. Two things would help. First, can you describe for us -- ideally with some code -- how you are visiting the expression tree now? And second, is there any particular "path" that is likely to be deep recursively? For example, if you are likley to have the expression tree `x=>x+x+x+x+....+x`, that's going to consistently recurse down *one side* of the add; sometimes it is easier to special-case the visitor to be iterative just in the common long path. – Eric Lippert Mar 29 '13 at 18:48
  • Ah, I see while I was typing that you answered my second question. Knowing that we are likely to be deeply recursive on OR makes the problem much easier to solve. (And of course compilers have this problem as well! One of the last things I did on Roslyn was make the code that processes left-associative binary operators iterative on the left so that it would be less likely to blow the stack.) – Eric Lippert Mar 29 '13 at 18:50
  • @EricLippert - at the moment, I'm just using trying to do an evaluation and it is overflowing the stack. I'm trying to write some code that can detect if the stack will overflow and handle it more gracefully (possibly by multi-threading it and increasing stack size or possibly by simply having a more descriptive error), so there isn't any visitor code yet. Also, thanks for taking a look. – AJ Henderson Mar 29 '13 at 18:54
  • @EricLippert Ok. I cant imagine how. I will have to read up on this. Got any could blog posts on the subject? :) – Lars Udengaard Mar 29 '13 at 19:02
  • @AJHenderson Listen to Eric, he knows what he is talking about :) If you are evaluating the expression with an visitor, why not track the depth, while evaluating, and break the evaluation when you reach some predefined maximum? But this really seems like symptom management. It would be better to find a solution that can handle the data-size. – Lars Udengaard Mar 29 '13 at 19:08
  • The "built-in" expression tree analyzers like `Compile` do not have any magic in them to prevent deep recursions; they assume that the trees are shallow. That's my bad; sorry about that. If that's the problem you face you're going to have to ask Microsoft to fix it. If you want to make your own visitor that is iterative rather than recursive, I can give you some tips on that. – Eric Lippert Mar 29 '13 at 19:13
  • While you're asking Microsoft to fix it, also ask them to make it possible to catch a Stack Overflow so you don't need a check like this in the first place. – Gabe Mar 29 '13 at 19:39
  • This question might also be of interest to you: http://stackoverflow.com/questions/15684330/recursive-approach-versus-stack-for-depth-first-search/15684914#15684914 – Eric Lippert Mar 29 '13 at 20:33

3 Answers3

9

Rather than try to solve your problem for expression trees specifically, let me describe for you some general techniques for dealing with badly-behaved trees.

You might want to start by reading my series of articles on solving the problem you pose: how do I determine the depth of a tree without using recursion?

http://blogs.msdn.com/b/ericlippert/archive/2005/07/27/recursion-part-one-recursive-data-structures-and-functions.aspx

Those articles were written back when I was working on JScript, so the examples are in JScript. It's not too hard to see how to apply these concepts to C# though.

Let me give you a little toy example in C# of how to do an operation on a recursive data structure without doing a full recursion. Suppose we have the following binary tree: (Let's assume WOLOG that the binary tree nodes are either zero or two children, never exactly one.)

class Node 
{
    public Node Left { get; private set; }
    public Node Right { get; private set; }
    public string Value { get; private set; }
    public Node(string value) : this(null, null, value) {}
    public Node(Node left, Node right, string value)
    {
        this.Left = left;
        this.Right = right;
        this.Value = value;
    }
}
...
Node n1 = new Node("1");
Node n2 = new Node("2");
Node n3 = new Node("3");
Node n3 = new Node("4");
Node n5 = new Node("5");
Node p1 = new Node(n1, n2, "+");
Node p2 = new Node(p1, n3, "*");
Node p3 = new Node(n4, n5, "+");
Node p4 = new Node(p2, p3, "-");

So we have the tree p4:

                -
             /     \
            *       +
           / \     / \
          +   3   4   5
         / \
        1   2

and we wish to print out p4 as a parenthesized expression

   (((1+2)*3)-(4+5))

The recursive solution is straightforward:

 static void RecursiveToString(Node node,  StringBuilder sb)
 {
     // Again, assuming either zero or two children.
     if (node.Left != null) 
         sb.Append(node.Value);
     else
     {
         sb.Append("(");
         RecursiveToString(node.Left, sb);
         sb.Append(node.Value);
         RecursiveToString(node.Right, sb);
         sb.Append(")");
      }
 }

Now suppose we know the tree to be likely "deep" on the left, but "shallow" on the right. Can we eliminate the recursion on the left?

 static void RightRecursiveToString(Node node,  StringBuilder sb)
 {
     // Again, assuming either zero or two children.
     var stack = new Stack<Node>();
     stack.Push(node);
     while(stack.Peek().Left != null)
     {
         sb.Append("(");
         stack.Push(stack.Peek().Left);
     }
     while(stack.Count != 0)
     {
         Node current = stack.Pop();
         sb.Append(current.Value);
         if (current.Right != null)
             RightRecursiveToString(current.Right, sb);
             sb.Append(")");
         }
     }
 }

The recurse-on-the-right only version is of course much harder to read and much harder to reason about, but it doesn't blow the stack.

Let's go through our example.

push p4
push p2
output (
push p1
output (
push n1
output (
loop condition is met
pop n1
output 1
go back to the top of the loop
pop p1
output +
recurse on n2 -- this outputs 2
output )
go back to the top of the loop
pop p2
output *
recurse on n3 -- this outputs 3
output )
go back to the top of the loop
pop p4
output -
recurse on p3
  push p3 
  push n4
  output (
  loop condition is met
  pop n4
  output 4
  go back to the top of the loop
  pop p3
  output +
  recurse on n5 -- this outputs 5
  output )
  loop condition is not met; return.
output )
loop condition is not met, return.

And what do we output? (((1+2)*3)-(4+5)), as desired.

So you've seen here that I can go from two recursions down to one. We can use similar techniques to go from one recursion down to none. Making this algorithm fully iterative -- so that it recurses neither on the left nor the right -- is left as an exercise.

(And incidentally: I ask a variation of this problem as an interview question, so if you ever get interviewed by me, you now have an unfair advantage!)

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Thanks for the detailed write up on doing an iterative search of a tree vs a recursive one. I think it will be very valuable for someone else who comes to the question, but my problem is not actually how to simply iteratively walk a tree (simply add each node's children and the depth of that node to a queue, remove the processed node from the queue, and continue processing with the next node in the queue. Track the max depth stored and it will iteratively scan a tree.) My problem is more that I have thus far been unable to track down how to walk the ExpressionTree structure specifically. – AJ Henderson Apr 01 '13 at 13:22
  • I found the ExpressionVisitor, but if my understanding is correct, it appears to visit recursively and is therefore unsuitable. Is there some other structure which exposes the nodes cleanly so that I can iterate them? – AJ Henderson Apr 01 '13 at 13:23
  • @AJHenderson: My suggestion is that you write your own version of the expression tree visitor that it is iterative on the nodes that are likely to be deeply recursive. – Eric Lippert Apr 01 '13 at 13:49
5

The ExpressionVisitor that is included in .Net is recursive, but using a trick, you can turn it into an iterative one.

Basically, you're processing a queue of nodes. For each node in the queue, use base.Visit() to visit all of its children, but then add those children into the queue instead of recursively processing them right away.

This way, you don't have to write code specific to each Expression subtype, but you also work around the recursive nature of ExpressionVisitor.

class DepthVisitor : ExpressionVisitor
{
    private readonly Queue<Tuple<Expression, int>> m_queue =
        new Queue<Tuple<Expression, int>>();
    private bool m_canRecurse;
    private int m_depth;

    public int MeasureDepth(Expression expression)
    {
        m_queue.Enqueue(Tuple.Create(expression, 1));

        int maxDepth = 0;

        while (m_queue.Count > 0)
        {
            var tuple = m_queue.Dequeue();
            m_depth = tuple.Item2;

            if (m_depth > maxDepth)
                maxDepth = m_depth;

            m_canRecurse = true;

            Visit(tuple.Item1);
        }

        return maxDepth;
    }

    public override Expression Visit(Expression node)
    {
        if (m_canRecurse)
        {
            m_canRecurse = false;
            base.Visit(node);
        }
        else
            m_queue.Enqueue(Tuple.Create(node, m_depth + 1));

        return node;
    }
}
svick
  • 236,525
  • 50
  • 385
  • 514
  • 1
    This appears to be exactly what I was looking for since it gives an iterative way to get at the nodes in each element. The one thing I'm not clear on is that apparently the Visit seems to not work quite the way I expected. How is the private queue kept consistent between the Visitors? Is the same visitor reused? – AJ Henderson Apr 01 '13 at 13:31
  • @AJHenderson There are no other visitors. If you create one visitor, then there will be only one visitor. `base.Visit()` doesn't ever create new visitors, I think that wouldn't make any sense. – svick Apr 01 '13 at 15:08
  • how are the sub-expressions being enqueued then? I don't see any calls that would add them to the queue iteratively. I could only find them by going in to the more specific types of Expression objects by hand. – AJ Henderson Apr 01 '13 at 15:46
  • @AJHenderson The enqueuing of each expression is done by the `m_queue.Enqueue()` line. Processing each subexpression of an expression is done by calling `base.Visit()`, which calls `Visit()` for each subexpression. – svick Apr 01 '13 at 15:59
  • Right, but isn't that going to be a different visitor since it is going down a level of recursion? – AJ Henderson Apr 01 '13 at 16:00
  • @AJHenderson Like I said before, there is no “different visitor”. The whole code executes using the same `DepthVisitor`. – svick Apr 01 '13 at 16:02
2

Rather than using recursion to iterate a tree you can always use an explicit in memory structure instead. If you want to closely mimic the recursive behavior you can even use an explicit Stack. Since this is storing all of the information on nodes yet to be processed in the heap, it'll take a heck of a lot more to run out of it.

Here is a general purpose method that traverses a tree based structure (iteratively, not recursively) and returns a flattened sequence of all of the items along with the depth of that item.

public static IEnumerable<Tuple<T, int>> TraverseWithDepth<T>(IEnumerable<T> items
    , Func<T, IEnumerable<T>> childSelector)
{
    var stack = new Stack<Tuple<T, int>>(
        items.Select(item => Tuple.Create(item, 0)));
    while (stack.Any())
    {
        var next = stack.Pop();
        yield return next;
        foreach (var child in childSelector(next.Item1))
        {
            stack.Push(Tuple.Create(child, next.Item2 + 1));
        }
    }
}

Now to use this all we need to do is pass in the root node(s), a function that maps each element to its direct children, and then we can take the max of the depth. Due to deferred execution each item yielded by the traverse function won't be retained in memory by Max, so the only items held in memory are the nodes who haven't been processed, but have had a parent that has been processed.

public static int GetDepth(Expression t)
{
    return TraverseWithDepth(new[] { t }, GetDirectChildren)
        .Max(pair => pair.Item2);
}
Servy
  • 202,030
  • 26
  • 332
  • 449
  • 1
    A nice solution. One small down side is that if the child selector iterates the children from "left to right", this code enumerates them in order from "right to left". If that matters, you can always say `foreach(var child in childSelector(next.Item1).Reverse())`. – Eric Lippert Mar 29 '13 at 20:15