Why is using a Python generator much slower to traverse binary tree than not?

Question

I've got a binary tree, where the nodes interact with data. I initially implemented a standard post order recursive traversal.

def visit_rec(self, node, data):
    if node:
        self.visit_rec(node.left, data)
        self.visit_rec(node.right, data)

        node.do_stuff(data)

I thought I could improve it by using generators so that I can use the same traversal method for other uses, and not have to pass the same data around constantly. This implementation is shown below.

def visit_rec_gen(self, node):
    if node:
        for n in self.visit_rec_gen(node.left):
                yield n
        for n in self.visit_rec_gen(node.right):
                yield n

        yield node

for node in self.visit_rec_gen():
    node.do_stuff(data)

However, this was far slower than the previous version (~50s to ~17s) and used far more memory. Have I made a mistake in my generator function version? I'd prefer to use this method but not at the expense of performance.

EDIT: Something I should have mentioned initially was that these results were obtained under PyPy 2.3.1, rather than standard CPython.

At first glance, I notice that your functions are not really doing the same thing. Your generator version checks each node both when it enters the function and when it makes the recursive call, while your original function only checks the node when it enters the function. — murgatroid99, Jul 25 '14 at 18:22
Also, what class are these methods part of? They look like you'd be better off with straight functions. — Silas Ray, Jul 25 '14 at 18:24
I've checked and the second check in the recursive call is unnecessary but doesn't really have any impact upon the performance. I'll remove it from the question. They're part of my BinaryTree class. — Stuart Lacy, Jul 25 '14 at 18:27
Don't you want `yield self.visit_rec_gen(node.left)` instead of `for n in self.visit_rec_gen(node.left): yield n` ? — colinro, Jul 25 '14 at 18:33
@colinro that returns a generator object rather than a node. — Stuart Lacy, Jul 25 '14 at 18:36
possible duplicate of [Python recursive generators performance](http://stackoverflow.com/questions/16731561/python-recursive-generators-performance) — dano, Jul 25 '14 at 18:39
These functions aren't really the same at all. The first function is very straightforwardly and simply grabbing some variables from the local namespace and off an object, and calling itself twice. The second, since it is using generators, is building a bunch of re-entrant stack frames, and constantly flipping back and forth between them. It's more flexible code, but it's also much less efficient. — Silas Ray, Jul 25 '14 at 18:42
Perhaps a purely iterative solution would avoid the function call overhead. There are some implementations [here](http://en.wikipedia.org/wiki/Tree_traversal#Implementations). — Kevin, Jul 25 '14 at 19:04
@keven That was my initial solution, it took about 25s although double the memory of the recursive solution. — Stuart Lacy, Jul 25 '14 at 19:09

Raymond Hettinger · Accepted Answer · 2014-07-25T18:50:03.290

6

On PyPy, function calls are much more highly optimized than generators or iterators.

There are many things that have different performance characteristics in PyPy (for example, PyPy's itertools.islice() performs abyssmally).

You're doing the right thing by measuring the performance to see which way is fastest.

Also note PyPy has tools to show the code that is generated so you get a more detailed answer to the question "what does it do". Of course, the question of "why does it do that" has a human component in the answer that involves what was convenient to implement or the proclivities of the implementers.

edited Jul 25 '14 at 18:50

answered Jul 25 '14 at 18:43

Raymond Hettinger

216,523
63
388
485

I don't really see how the generator version could be faster in any interpreter. It's just fundamentally more complicated. – Silas Ray Jul 25 '14 at 18:47
As mentioned in another comment I reran the examples under CPython (2.7) and got ~25s and ~35s respectively. The recursive generator however used far less (in the order of 10x) memory than when ran with PyPy. So yes, while the generator version wasn't faster under CPython as well, the difference in speed was less, and it used substantially less memory. – Stuart Lacy Jul 25 '14 at 18:51
2

@Silas In general, generators tend to do less work than function calls because generators create a stack frame once and re-use it on every call. In contrast, CPython's function calls tend to be slow because they create a new stack frame on every call. In PyPy, much of the stackframe creation overhead is optimized away, hence the speed ratio of recursive functions to generator calls is more favorable. – Raymond Hettinger Jul 25 '14 at 18:55
@RaymondHettinger I get that, but in this specific case, it's not particularly relevant. The non-generative version creates exactly one frame per node as does the generative version, but unlike the generative version, it's not constantly walking up and down the tree of frames to bubble yielded nodes up through all the nested generators. – Silas Ray Jul 25 '14 at 19:02
The other side of the equation is that PyPy does a better job optimizing functions than it does generators. That isn't an intrinsic difference and it may change as PyPy continues to improve. – Raymond Hettinger Jul 25 '14 at 19:07
generators are hard precisely because the frame stays alive, so someone has to put stuff on the heap frame (it escapes). In normal calls it does not escape so it does not have to be created. There are possible hacks, but it's not *that* easy to improve. – fijal Aug 05 '14 at 15:41

score 3 · Answer 2 · answered Jul 25 '14 at 18:29

3

If you're using python3.3, the yield from statement is optimized to be faster than iterating for the purpose of yielding:

def visit_rec_gen(self, node):
    if node:
        yield from self.visit_rec_gen(node.left)
        yield from self.visit_rec_gen(node.right)
        yield node

answered Jul 25 '14 at 18:29

shx2

61,779
13
130
153

I'm currently using PyPy 2.3.1 (which uses Python 2.7.6) so I can't use this, otherwise it looks useful. – Stuart Lacy Jul 25 '14 at 18:32
@Stu, are the results similar with CPython? I wonder if this is a pypy specific issue? – dano Jul 25 '14 at 18:42
2

@Stu, also you should probably make the fact that you're using pypy more prominent in the question. – dano Jul 25 '14 at 18:43
Good point. With CPython (2.7) the times were ~25s and ~35s respectively. The recursive generator however used far less (in the order of 10x) memory than when ran with PyPy. – Stuart Lacy Jul 25 '14 at 18:49

Silas Ray · Answer 3 · 2014-07-25T19:29:09.597

2

The generative method is just less efficient, due to the realities of using a generator. However, you could get the flexibility of the generator approach with much of the efficiency of the non-generator with a callback-based system.

# NOTE that this should be a method on Node, not Tree
def apply_to_children_and_self(self, func, *args, **kwargs):
    if self.left:
        self.left.apply_to_children_and_self(func, *args, **kwargs)
    if self.right:
        self.right.apply_to_children_and_self(func, *args, **kwargs)
    func(self, *args, **kwargs)

...

head.apply_to_children_and_self(Node.do_stuff, data)

edited Jul 25 '14 at 19:29

answered Jul 25 '14 at 18:58

Silas Ray

25,682
5
48
63

I've tried implementing this as is, but get an error in the recursive call saying that "node" doesn't exist. If I change that to be 'self' then I get a TypeError saying Node object is not callable. – Stuart Lacy Jul 25 '14 at 19:24
Sorry, I was editing that while I was working on it, but I forgot to remove those... There should be no nodes passed explicitly there at all. – Silas Ray Jul 25 '14 at 19:29
This works about the same performance wise as the standard recursive implementation, although I like the flexibility with it, thanks! – Stuart Lacy Jul 25 '14 at 20:00

Why is using a Python generator much slower to traverse binary tree than not?

3 Answers3