While I understand that tail recursion optimization is non-Pythonic, I came up with a quick hack to a question on here that was deleted as soon as a I was ready to post.
With a 1000 stack limit, deep recursion algorithms are not usable in Python. But sometimes it is great for initial thoughts through a solution. Since functions are first class in Python, I played with returning a valid function and the next value. Then call the process in a loop until done with single calls. I'm sure this isn't new.
What I found interesting is that I expected the extra overhead of the passing the function back and forth to make this slower than normal recursion. During my crude testing I found it to take 30-50% the time of normal recursion. (With an added bonus of allowing LONG recursions.)
Here is the code I'm running:
from contextlib import contextmanager
import time
# Timing code from StackOverflow most likely.
@contextmanager
def time_block(label):
start = time.clock()
try:
yield
finally:
end = time.clock()
print ('{} : {}'.format(label, end - start))
# Purely Recursive Function
def find_zero(num):
if num == 0:
return num
return find_zero(num - 1)
# Function that returns tuple of [method], [call value]
def find_zero_tail(num):
if num == 0:
return None, num
return find_zero_tail, num - 1
# Iterative recurser
def tail_optimize(method, val):
while method:
method, val = method(val)
return val
with time_block('Pure recursion: 998'):
find_zero(998)
with time_block('Tail Optimize Hack: 998'):
tail_optimize(find_zero_tail, 998)
with time_block('Tail Optimize Hack: 1000000'):
tail_optimize(find_zero_tail, 10000000)
# One Run Result:
# Pure recursion: 998 : 0.000372791020758
# Tail Optimize Hack: 998 : 0.000163852100569
# Tail Optimize Hack: 1000000 : 1.51006975627
Why is the second style faster?
My guess is the overhead with creating entries on the stack, but I'm not sure how to find out.
Edit:
In playing with call counts, I made a loop to try both at various num values. Recursive was much closer to parity when I was looping and calling multiple times.
So, I adding this before the timing, which is find_zero under a new name:
def unrelated_recursion(num):
if num == 0:
return num
return unrelated_recursion(num - 1)
unrelated_recursion(998)
Now the tail optimized call is 85% of the time of the full recursion.
So my theory is that 15% penalty is the overhead for the larger stack, versus single stack.
The reason I saw such a huge disparity in execution time when only running each once was the penalty for allocation of the stack memory and structure. Once that is allocated, the cost of using them is drastically lowered.
Because my algorithm is dead simple, the memory structure allocation is a large portion of the execution time.
When I cut my stack priming call to unrelated_recursion(499)
, I get about half way between fully primed and not primed stack in find_zero(998)
execution time. This makes sense with the theory.