C++ Collatz Conjecture Optimization

Question

In ProjectEuler problem #14, one needs to find the longest Collatz chain, up to 1 million. I found a halfway decent way to do so, however, it feels like I'm just being stupid because I can't find a way to make this code more efficient (the code is supposed to only print out the solution, after it tests 1 to 1 million, but hasn't printed anyting out after 10 minutes). Am I tackling this problem the wrong way, or is there a way to optimize my existing code?

#include <iostream>
using namespace std;

int main()
{
    int i;
    int x;
    int n;
    int superN;
    int superI;

    superN = 0;
    superI = 0;

    for (i = 1; i <= 1000000; i++) {
        x = i;
        n = 1;

        do {
            if (x % 2 == 0) {
                x = x / 2;
            }

            if (x % 2 == 1 && x != 1) {
                x = 3 * x + 1;
            }

            n++;

            if (n > superN) {
                superN = n;
                superI = i;
            }
        } while (x != 1);
    }

    cout << "The number " << superI << " ran for " << superN << " terms.";
    system("pause");
    return 0;
}

Memorize starting points you've already investigated and know the length of. When tracing from a new starting point, stop as soon as you hit the number you've investigated previously. — Igor Tandetnik, Jun 30 '16 at 04:54

score 3 · Accepted Answer · answered Jun 30 '16 at 05:08

3

You've got a few small problems:

I'm fairly sure that you are overflowing the int data type. Use a uint64_t to make this far less likely.
You should only update superI and superN outside of the while loop. This shouldn't matter, but it hurts performance.
On each iteration you should only modify x once. You currently might modify it twice, which might cause you to fall into an infinite loop. And your calculation of n will be off as well.
Use memoization to improve performance by caching old results.

Applying this, you could come up with some code like this:

#include <cstdint>
#include <iostream>
#include <map>
using namespace std;

int main()
{
    uint64_t i;
    uint64_t x;
    uint64_t n;
    uint64_t superN;
    uint64_t superI;

    std::map<uint64_t, uint64_t> memory;

    superN = 0;
    superI = 0;

    for (i = 1; i <= 1000000; i++) {
        x = i;
        n = 1;

        do {
            if (memory.find(x) != memory.end()) {
                n += memory[x];
                break;
            }

            if (x % 2 == 0) {
                x = x / 2;
            } else {
                x = 3 * x + 1;
            }

            n++;
        } while (x != 1);

        if (n > superN) {
            superN = n;
            superI = i;
        }

        memory[i] = n;
    }

    cout << "The number " << superI << " ran for " << superN << " terms.\n";
    system("pause");
    return 0;
}

Which takes 4 seconds to output:

The number 837799 ran for 556 terms.

answered Jun 30 '16 at 05:08

Bill Lynch

80,138
16
128
173

The real bug is the integer overflow. The sequential if only provides an incorrect count. Additionally, a much larger speed up would come from excluding even numbers -- all even numbers will always get smaller after their iteration and become an odd numbers at some point. http://pastebin.com/XgvCgUCa This is an example to demonstrate the point of odd numbered solutions -- runs in 0.18 seconds. – p4plus2 Jun 30 '16 at 05:15
@p4plus2: Consider the case where we are looking for the largest sequence starting with a number <= 20. In that case the correct answer is 18 which your solution wouldn't find. – Bill Lynch Jun 30 '16 at 05:26
Also using ``x & 1L`` instead of ``x % 2 == 0`` and then inverting the if logic would be faster. And ``x >>= 1`` instead of ``x = x / 2``. Bitwise operators generally are faster than arithmetic. – BrainStone Jun 30 '16 at 05:53
@BrainStone: Both of your examples will produce identical assembly code. – Bill Lynch Jun 30 '16 at 06:09
@BillLynch 19 also has 21 terms. But we can solve this problem: `while(i*2 <= MAX){ n++; i*=2; }` So in your case 9 will produce 20 steps 18 is less than max which will show 21 cycles for 18. This is still half the needed operations on average. – p4plus2 Jun 30 '16 at 07:26
This can be improved by using a stack. For example, when `i=3`, the stack will be: `3, 10, 5, 16` and it will stop there. We know that `16 -> 4`. Instead of only setting the collatz of 3 to 7, we can set `10 to 6`, `5 to 5`. For bigger values, this becomes more useful. – smttsp Jan 22 '20 at 22:27
This program ran 10 times faster without memoization on by computer: `g++ -std=gnu++11 3xplus1.cpp && time ./a.out` . Also `n=1` should be changed to `n=0` for the memoization to produce the correct output: 524, not 556. – Kjetil S. Aug 05 '21 at 22:31

score 0 · Answer 2 · answered Sep 11 '16 at 16:45

I would suggest not to use memoization as for me it run slower; in my case (up to 10,000,000) the code below is faster without memoization. the main changes are:

only testing if the current number is even once (not need for an else-if).
using a bitwise operation instead of the modulo operation (slightly faster)

Apart from that I don't know why your code is so long (mine is below 200 milliseconds) maybe you compile as debug ?

bool isEven(uint64_t value)
{
    return (!(value & 1));
}

uint64_t solveCollatz(uint64_t start)
{
    uint64_t counter = 0;
    while (start != 1)
    {
        if(isEven(start))
        { 
            start /= 2;
        }
        else
        {
            start = (3 * start) + 1;
        }
        counter++;
    }

    return counter;
}

score 0 · Answer 3 · answered May 17 '18 at 23:07

If you can use compiler intrinsics, particularly with counting and removing trailing zeros, you'll recognize you don't need to branch in the main loop, you'll always alternate odd and even. The memoization techniques that have been previously presented will rarely short-circuit around the math you're doing, since we're dealing with hailstone numbers - additionally, the majority of numbers only have one entry point, so if you see them once, you'll never see them again.

In GCC it'll look something like this:

#include <cstdint>
#include <iostream>
#include <unordered_map>
#include <map>
using namespace std;

using n_iPair = std::pair<uint32_t, uint64_t>;

auto custComp = [](n_iPair a, n_iPair b){
  return a.first < b.first;
};

int main()
{
    uint64_t x;
    uint64_t n;
    n_iPair Super = {0,0};

    for (auto i = 1; i <= 1000000; i++){
        x = i;
        n = 0;

        if (x % 2 == 0) {
          n += __builtin_ctz(x); // account for all evens
          x >>= __builtin_ctz(x); // always returns an odd
        }

         do{ //when we enter we're always working on an odd number

          x = 3 * x + 1; // always increases an odd to an even
          n += __builtin_ctz(x)+1; // account for both odd and even transfer
          x >>= __builtin_ctz(x); // always returns odd

        }while (x != 1);

        Super = max(Super, {n,i}, custComp);

    }

    cout << "The number " << Super.second << " ran for " << Super.first << " terms.\n";
    return 0;
}

score 0 · Answer 4 · answered Mar 26 '19 at 15:36

If performance is critical, but memory isn't, you can use caching to improve the speed.

#include <iostream>
#include <chrono>
#include <vector>
#include <sstream>

std::pair<uint32_t, uint32_t> longestCollatz(std::vector<uint64_t> &cache)
{
    uint64_t length = 0;
    uint64_t number = 0;

    for (uint64_t current = 2; current < cache.size(); current++)
    {
        uint64_t collatz = current;
        uint64_t steps = 0;
        while (collatz != 1 && collatz >= current)
        {
            if (collatz % 2)
            {
                // if a number is odd, then ((collatz * 3) + 1) would result in
                // even number, but even number can have even or odd result,  so
                // we can combine two steps for even number, and increment twice.
                collatz = ((collatz * 3) + 1) / 2;
                steps += 2;
            }
            else
            {
                collatz = collatz / 2;
                steps++;
            }
        }

        cache[current] = steps + cache[collatz];

        if (cache[current] > length)
        {
            length = cache[current];
            number = current;
        }
    }
    return std::make_pair(number, length);
}

int main()
{
    auto start = std::chrono::high_resolution_clock::now();;

    uint64_t input = 1000000;
    std::vector<uint64_t> cache(input + 1);
    auto longest = longestCollatz(cache);

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
    std::cout << "Longest Collatz (index : value) --> " << longest.first << " : " << longest.second;
    std::cout << "\nExecution time: " << duration << " milliseconds\n";

    return EXIT_SUCCESS;
}

I wonder if the vector of 1 mil of uint32 will initialize slower than simply running the algorithm... — a_girl, Sep 13 '21 at 22:42

C++ Collatz Conjecture Optimization

4 Answers4