Longest Collatz (or Hailstone) sequence optimization - Python 2.7

Question

I've made a program that prints out a list of numbers, each one with a greater number of steps (according to the Collatz Conjecture) needed to reach 1 than the previous:

limit = 1000000000
maximum = 0
known = {}
for num in xrange(2, limit):
    start_num = num
    steps = 0
    while num != 1:
        if num < start_num:
            steps += known[num]
            break;
        if num & 1:
            num = (num*3)+1
            steps += 1
        steps += 1
        num //= 2
    known[start_num] = steps
    if steps > maximum:
        print start_num,"\t",steps
        maximum = steps

I cache the results I already know to speed up the program. This method works up to the limit of 1 billion, where my computer runs out of memory (8GB).

Is there a more efficient way to cache results?
Is there a way to further optimize this program?

Thank you in advance.

Well for starters you can immediately reduce numbers that are of the form (2^a)b to just b. So all even numbers reduce to all odds which you can cache from a prior iteration — Alex R., Aug 11 '16 at 00:48
So before caching a number check if it's a multiple of a power of 2? — SorenLantz, Aug 11 '16 at 00:57

Tim Peters · Answer 1 · 2016-08-13T18:44:48.750

It appears to be inherently hard to speed up Collatz programs enormously; the best programs I'm aware of are distributed, using idle cycles on hundreds (thousands ...) of PCs around the world.

There are some easy things you can do to optimize your program a little in pure CPython, although speed and space optimizations are often at odds:

Speed: a compute-heavy program in Python should always be written as a function, not as the main program. That's because local variable access is significantly faster than global variable access.
Space: making known a list instead of a dict requires significantly less memory. You're storing something for every number; dicts are more suitable for sparse mappings.
Space: an array.array requires less space still - although is slower than using a list.
Speed: for an odd number n, 3*n + 1 is necessarily even, so you can collapse 2 steps into 1 by going to (3*n + 1)//2 == n + (n >> 1) + 1 directly.
Speed: given a final result (number and step count), you can jump ahead and fill in the results for that number times all powers of 2. For example, if n took s steps, then 2*n will take s+1, 4*n will take s+2, 8*n will take s+3, and so on.

Here's some code with all those suggestions, although I'm using Python 3 (in Python 2, you'll at least want to change range to xrange). Note that there's a long delay at startup - that's the time taken to fill the large array with a billion 32-bit unsigned zeroes.

def coll(limit):
    from array import array
    maximum = 0
    known = array("L", (0 for i in range(limit)))
    for num in range(2, limit):
        steps = known[num]
        if steps:
            if steps > maximum:
                print(num, "\t", steps)
                maximum = steps
        else:
            start_num = num
            steps = 0
            while num != 1:
                if num < start_num:
                    steps += known[num]
                    break
                while num & 1:
                    num += (num >> 1) + 1
                    steps += 2
                while num & 1 == 0:
                    num >>= 1
                    steps += 1
            if steps > maximum:
                print(start_num, "\t", steps)
                maximum = steps
            while start_num < limit:
                assert known[start_num] == 0
                known[start_num] = steps
                start_num <<= 1
                steps += 1

coll(1000000000)

GETTING GONZO

A tech report written in 1992 gives many ways of speeding this kind of search: "3x+1 Search Programs", by Leavens and Vermeulen. For example, @Jim Mischel's "cut off based on previous peaks" idea is essentially the paper's Lemma 20.

Another: for an easy factor of 2, note that you can "almost always" ignore even starting numbers. Why: let s(n) denote the number of steps needed to reach 1. You're looking for new peaks in the value of s(). Suppose the most recent peak was found at n, and you're considering an even integer i with n < i < 2*n. Then in particular i/2 < n, so s(i/2) < s(n) (by the definition of "peak" and that a new peak was reached at n). But s(i) == s(i/2) + 1, so s(i) <= s(n): i cannot be a new peak.

So after finding a new peak at n, you can skip all even integers up to (but not including) 2*n.

There are many other useful ideas in the paper - but they're not all that easy ;-)

Jim Mischel · Accepted Answer · 2016-08-11T18:21:55.293

You only really need to cache the odd numbers. Consider in your program what happens when you start working on a number.

If you take your beginning number, X, and do a mod 4, you end up with one of four cases:

0 or 2: Repeatedly dividing by 2 will eventually result in an odd number that's less than X. You have that value cached. So you can just count the number of divides by 2, add that to the cached value, and you have the sequence length.
1: (3x+1)/2 will result in an even number, and dividing that by 2 again will result in a number that's less than X. If the result is odd, then you already have the cached value, so you can just add 3 to it. If the result is even, repeatedly divide by 2, until you get to an odd number (which you already have cached), add 3 and the number of divisions by 2 to the cached value, and you have the sequence length.
3: Do the standard Collatz sequence calculation until you get to a number that's less than the starting number. Then you either have the value cached or the number is even and you repeatedly divide by 2 until you get to an odd number.

This might slow your program a little bit because you have a few more divides by 2, but it doubles your cache capacity.

You can double your cache capacity again by only saving sequence lengths for numbers where x mod 4 == 3, but at the cost of even more processing time.

Those only give you linear increases in cache space. What you really need is a way to groom your cache so that you're not having to save so many results. At the cost of some processing time, you only need to cache the numbers that produce the longest sequences found so far.

Consider that when you compute that 27 has 111 steps, you have saved:

starting value, steps
1, 0
2, 1
3, 7
6, 8
7, 16
9, 19
18, 20
25, 23
27, 111

So when you see 28, you divide by 2 and get 14. Searching your cache, you see that the number of steps for 14 to go to 1 can't be more than 19 (because no number less than 18 takes more than 19 steps). So the maximum possible sequence length is 20. But you already have a maximum of 111. So you can stop.

This can cost you a little more processing time, but it greatly extends your cache. You'd only have 44 entries all the way up to 837799. See https://oeis.org/A006877.

It's interesting to note that if you do a logarithmic scatter plot of those numbers, you get a very close approximation of a straight line. See https://oeis.org/A006877/graph.

You could combine approaches by keeping a second cache that tells you, for numbers that are greater than the number with the current maximum, how many steps it took to get that number down below the current maximum. So in the above case, where 27 has the current maximum, you'd store 26 for the number 35, because it takes six operations (106, 53, 160, 80, 40, 20) to get 35 to 20. The table tells you that it can't take more than 20 more steps to get to 1, giving you a maximum possible of 26 steps. So if any other value reduces to 35, you add the current number of steps to 26, and if the number is less than 111, then you know you can't possibly have a new maximum with this number. If the number is greater than 111, then you have to continue to compute the entire sequence.

Whenever you find a new maximum, you add the number that generated it to your first cache, and clear the second cache.

This will be slower (my gut feel is that in the worst case it might double your processing time) than caching the results for every value, but it'll greatly extend your range.

The key here is that extending your range is going to come at the cost of some speed. That's a common tradeoff. As I pointed out above, you can do a number of things to save every nth item, which will give you an essentially larger cache. So if you save every 4th value, your cache is essentially 4 times as large as if you saved every value. But you reach the point of diminishing returns very quickly. That is, a cache that's 10 times larger than the original isn't a whole lot larger than the 9x cache.

My suggestion gives you essentially an exponential increase in cache space, at the cost of some processing time. But it shouldn't be a huge increase in processing time because in the worst case the number with the next maximum will be double the previous maximum. (Think of 27, with 111 steps, and 54, with 112 steps.) It takes a bit more code to maintain, but it should extend your range, which is currently only 30 bits, to well over 40 bits.

Success, thank you! I implemented the system you provided in the first part of your response in order to cut the cache size in half. With it I was able to successfully compute that 670617279 is the number under 1 billion that requires the most steps (986) to reach one. I am going to attempt all of the numbers under 10 billion next and if my computer can't take it I will implement more of your suggestions to make more room in the cache. Thank you again! — SorenLantz, Aug 11 '16 at 21:21
Just tried 10 billion and it crashed...continuing to implement — SorenLantz, Aug 11 '16 at 21:28
@SorenLantz: Storing only the odd numbers, you'll still need to save 5 billion numbers and their counts. That's going to take on the order of 40 GB. More, actually, because you'll need to start storing `long` values once you get over 4 billion. Closer to 60 GB. You'll also run into problems with maximum array size (can't have more than 2 billion items in an array). You'll have to get creative. — Jim Mischel, Aug 11 '16 at 21:32
You're right, my computer restarts the shell after hitting 60GB — SorenLantz, Aug 11 '16 at 21:36
@SorenLantz: By the way, keep a copy of your code that uses the old method, so you can use it to verify that the new caching method works. — Jim Mischel, Aug 11 '16 at 21:39
Already have, I learned that the hard way early on. Thank you for the reminder. — SorenLantz, Aug 12 '16 at 02:16
Another success, all record breakers up to 10 billion have been found. The total compute time was 20 hours using the caching of maximums only method, thank you again. — SorenLantz, Aug 13 '16 at 13:40
@SorenLantz, see the edit to my answer for a link to a very useful paper. You can likely speed up your program by a factor 10 by implementing only some of its ideas. I added an explanation for one way to get a factor of 2 that's very easy; some of the others are, of course, much more involved. — Tim Peters, Aug 13 '16 at 18:48
@SorenLantz, would you mind appending your new algorithm to your original post? — , Jan 31 '17 at 19:14

Longest Collatz (or Hailstone) sequence optimization - Python 2.7

2 Answers2

GETTING GONZO