2

I'm trying to create an algorithm that will compute the collatz conjecture, this is the code so far:

while (n > 1) {
   n % 2 == 0 ? n /= 2 : n = n * 3 + 1;
}

I was wondering if there was a way to optimize this any further since efficiency and speed is crucial for this, and I've heard about branchless programming but I'm not sure how to implement it, or if it's worth it to begin with.

Existentialist
  • 177
  • 2
  • 9
  • 1) just considering your question as is, whether this is worth it or not depends on the machine you're running on. Many chips today will do eager execution (execute ahead on _both_ sides of a conditional branch and only commit the one which turns out to be needed) and the operations for collatz - especially if you (or your compiler) translates them to shifts and adds - are simple in the integer ALUs, so there may not be any need at all on those machines to worry about this. – davidbak Jan 09 '22 at 18:33
  • 2) The context counts too. You wouldn't care if you were only doing a few of these. So you must be trying lots and lots of cases. LOTS AND LOTS! If so, the much bigger win would be to look at SIMD (if not GPU) execution to do the evaluation on several (many, for GPU) separate `n` simultaneously. – davidbak Jan 09 '22 at 18:35
  • 4
    You can replace this entire loop with `n = 1` if n is a 64 bit (or smaller) positive integer – Artyer Jan 09 '22 at 18:45
  • @Artyer - oh sure, I have an optimizing pass for LLVM that recognizes this and does that ... but actually, has Collatz been exhaustively tested to 2^64? – davidbak Jan 09 '22 at 18:49
  • @davidbak According to https://en.wikipedia.org/wiki/Collatz_conjecture#Experimental_evidence it's been tested up to 2^68 by the algorithm used in this paper: https://link.springer.com/article/10.1007%2Fs11227-020-03368-x – Artyer Jan 09 '22 at 18:50
  • @Artyer - further work is progressing on that by the author! here's his current dashboard: http://collatz-problem.org/ - with source code here: https://github.com/xbarin02/collatz/ - the dashboard shows he's reached 2^69 with an estimated 3 years left to get to 2^70 ... – davidbak Jan 09 '22 at 18:57

2 Answers2

4

Sure. You need the loop, of course, but the work inside can be done like this:

n /= (n&-n);  // remove all trailing 0s
while(n > 1) {
    n = 3*n+1;
    n /= (n&-n);  // remove all trailing 0s
}

It also helps that this technique does all the divisions by 2 at once, instead of requiring a separate iteration for each of them.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • Division is typically expensive though, so this may defeat the motivation for making it branchless. – Eric Postpischil Jan 09 '22 at 18:41
  • How do this work? I get the wrong result when [trying it out](https://godbolt.org/z/1enP1ToW5). – Ted Lyngmo Jan 09 '22 at 18:43
  • @TedLyngmo: It collapses all consecutive divisions by two into a single division by the greatest power of two that divides n. (`n & -n` evaluates to the lowest bit set in `n`, when two’s complement is used.) – Eric Postpischil Jan 09 '22 at 18:44
  • @EricPostpischil Sorry, I was unclear. How is it supposed to be used? When I used it (as I did in the link) it gave me the wrong result. – Ted Lyngmo Jan 09 '22 at 18:45
  • @TedLyngmo your implementation will only print out the odd numbers, because the `n /= (n&-n)` does all the divisions by 2 at once. – Matt Timmermans Jan 09 '22 at 18:46
  • I believe the idea is that it collapses successive divisions by 2 into a single operation. So for instance instead of `80 40 20 10 5` you'd just go directly from `80` to `5`. – Nathan Pierson Jan 09 '22 at 18:46
  • Ahhh... now I get it! Nice! Have an upvote! :-) – Ted Lyngmo Jan 09 '22 at 18:47
  • @EricPostpischil I don't know how slow division is these days, but I expect it's faster than a loop. Also, if `n` is unsigned, there's a good chance that the compiler can replace it with `bsf` and shift. – Matt Timmermans Jan 09 '22 at 18:51
  • 1
    `n >>= std::countr_zero(n)` could be faster to remove trailing zeros, and I think it's more readable as well. If you don't have C++20, you can most likely use `__builtin_ctz(n)` instead. – IlCapitano Jan 09 '22 at 21:29
1

One way to make it branchless (except for the loop condition) is to multiply n / 2 with the n % 2 == 0 result (1 for true) and multiply (n * 3 + 1) with the negated result of (n % 2 == 0) and add them together.

void collatz(unsigned long long n) {
    std::cout << n << '\n';
    while (n > 1) {
        auto m = n % 2 == 0;
        n = m * (n / 2) + !m * (n * 3 + 1);
        std::cout << n << '\n';
    }
}

Demo

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108