I've written a program to benchmark two ways of finding "the longest Collatz chain for integers less than some bound".
The first way is with "backtrack memoization" which keeps track of the current chain from start till hash table collision (in a stack) and then pops all the values into the hash table (with incrementing chain length values).
The second way is with simpler memoization that only memoizes the starting value of the chain.
To my surprise and confusion, the algorithm that memoizes the entirety of the sub-chain up until the first collision is consistently slower than the algorithm which only memoizes the starting value.
I'm wondering if this is due to one of the following factors:
Is Python really slow with stacks? Enough that it offsets performance gains
Is my code/algorithm bad?
Is it simply the case that, statistically, as integers grow large, the time spent revisiting the non-memoized elements of previously calculated Collatz chains/sub-chains is asymptotically minimal, to the point that any overhead due to popping elements off a stack simply isn't worth the gains?
In short, I'm wondering if this unexpected result is due to the language, the code, or math (i.e. the statistics of Collatz).
import time
def results(backtrackMemoization, start, maxChainValue, collatzDict):
print()
print(("with " if backtrackMemoization else "without ") + "backtracking memoization")
print("length of " + str(collatzDict[maxChainValue[0]]) + " found for n = " + str(maxChainValue[0]))
print("computed in " + str(round(time.time() - start, 3)) + " seconds")
def collatz(backtrackMemoization, start, maxChainValue, collatzDict):
for target in range(1, maxNum):
n = target
if (backtrackMemoization):
stack = []
else:
length = 0
while (n not in collatzDict):
if (backtrackMemoization):
stack.append(n)
else:
length = length + 1
if (n % 2):
n = 3 * n + 1
else:
n = n // 2
if (backtrackMemoization):
additionalLength = 1
while (len(stack) > 0):
collatzDict[stack.pop()] = collatzDict[n] + additionalLength
additionalLength = additionalLength + 1
else:
collatzDict[target] = collatzDict[n] + length
if (collatzDict[target] > collatzDict[maxChainValue[0]]):
maxChainValue[0] = target
def benchmarkAlgo(maxNum, backtrackMemoization):
start = time.time()
maxChainValue = [1]
collatzDict = {1:0}
collatz(backtrackMemoization, start, maxChainValue, collatzDict)
results(backtrackMemoization, start, maxChainValue, collatzDict)
try:
maxNum = int(input("enter upper bound> "))
print("setting upper bound to " + str(maxNum))
except:
maxNum = 100000
print("defaulting upper bound to " + str(maxNum))
benchmarkAlgo(maxNum, True)
benchmarkAlgo(maxNum, False)