8

Magic Numbers

A positive integer is “magic” if, and only if, it can be reduced to 1 by repeatedly dividing it by 2 if it’s even or multiplying it by 3 and then adding 1 if it’s odd. So, for example, 3 is magic because 3 reduces first to 10 (3*3+1), then to 5 (10/2), then to 16 (5*3+1), then to 8 (16/2), then to 4 (8/2), then to 2 (4/2), and finally to 1 (2/2). The magic numbers hypothesis states that all positive integers are magic, or, formally: ∀x ∈ Z, MAGIC(x) where MAGIC(x) is the predicate “x is magic”.

We are supposed to develop a C++ program to find "Magic Numbers" from 1 to 5 Billion. It is supposed to take 80 seconds or less if being done correctly. Mine takes approximately 2 hours, and the example program we were given would take 14 days. Here is my code, what can I do to speed it up? Am I missing obvious optimization issues?

#include <iostream>
using namespace std;

bool checkMagic(unsigned long number);

int main()
{
  for(unsigned long i = 1; i <= 5000000000; i++)
  {
    if(i % 1000000 == 0)
    {
      //only print out every 1000000 numbers to keep track, but slow it down 
      //by printing out every single number
      cout<<i<<endl;
    }

    if(!checkMagic(i))
    {
      //not magic
      cout<<i<<" not a magic number"<<endl;
      break;
    }
  }
}

bool checkMagic(unsigned long number)
{
  if(number % 2 == 0)
  {
    number = number / 2;
  }
  else
  {
    number = (number * 3) + 1;
  }

  if(number !=  1)
  {
    checkMagic(number);
  }

  return 1;
}
Chaz
  • 83
  • 1
  • 6
  • 17
    Better ask at [SE Code Review](http://codereview.stackexchange.com/) to improve working code please. – πάντα ῥεῖ Sep 09 '16 at 19:36
  • 9
    The trick to making the program fast is to remember all the previous magic numbers you've found. If you encounter a previously seen magic number in the process of multiplying and dividing, you can stop immediately. This is called *memoization*. – Barmar Sep 09 '16 at 19:37
  • 1
    Well for one thing I would get rid of the recursion. The compiler may be optimizing it away but by using a loop you know you have a loop and you wont have all of the function calls to make. – NathanOliver Sep 09 '16 at 19:37
  • In C++ you can use a `std::set` to remember all the magic numbers you've seen before. – Barmar Sep 09 '16 at 19:39
  • 10
    Your `checkMagic` always returns `1`. You aren't even using the result of the recursion. – Kevin Sep 09 '16 at 19:40
  • 3
    You want to find which of the numbers from 1 to 5 billion are "magic" ? Easy : all of them. Look up the [Collatz Conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture). – Nelfeal Sep 09 '16 at 19:44
  • 1
    @Barmar A set of 5 billion `unsigned long` ? That's like 40 or 80 GiB (counting the inner pointers)... – Nelfeal Sep 09 '16 at 19:47
  • 1
    a bit string of 5 biliion bits is just ~600Mb - it's not a problem to keep a track of all previously found magic numbers – Serge Sep 09 '16 at 19:51
  • If the size of the set is an issue, you could only do that check when the recursion reaches a number less than some threshold like `1,000,000`. – Barmar Sep 09 '16 at 19:53
  • 1
    It would be better to track all of the non-magic numbers (there shouldn't be any). If while iterating you reach a number smaller than the initial number (since you've seen all of those already) then return false if you already know it isn't a magic number, otherwise return true. – Kevin Sep 09 '16 at 19:54
  • What about parallel approach? – BiagioF Sep 09 '16 at 19:54
  • @Serge That's fine, and to be honest, a vector of 5 billion bytes for performance would be fine too (5GiB isn't hard to come by nowadays). However, there are way more efficient methods, like for example just keeping track of the last level of the Collatz tree. – Nelfeal Sep 09 '16 at 19:55
  • Look at [memoization](https://en.wikipedia.org/wiki/Memoization). – Jarod42 Sep 09 '16 at 19:56
  • @Kevin, yes, knowing the fact that it is `Collatz Conjecture`. We assume no such knowledge, otherwise the task is just to write `isMagic(int x) { return x > 0)` – Serge Sep 09 '16 at 19:56
  • This isn't assuming that knowledge, at least not directly. We know that the list of non-magic numbers will be 0, but we can still "pretend" to not know that and search through the list anyway. The point is once we start checking the "magic" status of N we already know the "magic" status of 1 through N-1. – Kevin Sep 09 '16 at 19:57
  • @Nelxiost, I had in mind a lot of optimizations at time of writing that comment: I just want people to learn something by trial and error approach) – Serge Sep 09 '16 at 19:59
  • @Kevin wow I did not expect this to get so many comments so fast, but another person in the class gave me some hints to how he solved it. And you are right. Instead of keeping track of every number, since it checks every number in order, if the number that is being divided and multiplied goes below what it started as, it has already been checked; every number below what is being checked already has been checked.The program now completes in 30 seconds with this implemented. – Chaz Sep 09 '16 at 20:04
  • @Chaz you still need to have a code that works if some non-magic number is found in the past: otherwise as soon as you find a single non-magic you can't say anything about the rest up to 5bln – Serge Sep 09 '16 at 20:07
  • _We are supposed to develop a C++ program to find "Magic Numbers" from 1 to 5 Billion_ I understand this is an exercise, but actually your program can work only if you already know that every integer up to 5 billion is a magic number. If not, when you meet your "not magic number" your checkMagic will end in an infinite recursion – Gian Paolo Sep 09 '16 at 20:10
  • @Chaz, btw, there is no need for recursion at all – Serge Sep 09 '16 at 20:12
  • BTW, a `bool` function returns `true` or `false`. The `1` is an integer and not a `bool` value. – Thomas Matthews Sep 09 '16 at 20:18
  • You can start by making sure your build has compiler optimizations enabled ;-) – Jesper Juhl Sep 09 '16 at 20:23
  • I'm voting to close this question as off-topic because this question belongs on SE code review. – rubenvb Sep 22 '16 at 08:40
  • I'm voting to close this question as off-topic because goes on CR – djechlin Dec 17 '16 at 23:08
  • Note that if you prove that a number is a magic number, then you also prove that all numbers in that sequence are magic numbers. For example, `1 -> 4 -> 2 -> 1`, therefore `1`, `2`, and `4` are all magic. Similarly, as you pointed out, `3` proves that `3 -> 10 -> 5 -> 16 -> 8 -> 4` are all magic. Therefore, since `1`, `2`, `3`, `4`, and `5` are magic, (for example) any number _n_ that reduces to _n_ `<= 5` must also be magic, as must all numbers in its sequence. You can use things like this to short-circuit your logic, if you keep track of them. – Justin Time - Reinstate Monica Dec 18 '16 at 02:21

4 Answers4

4

This question basically asks to verify the Collatz Conjecture up to 5B.

I think the key here is, for each number n we are checking, to view the optimistic scenario and the pessimistic scenario, and to check the optimistic one before reverting to the pessimistic one.

In the optimistic scenario, as we we modify n according to the n / 2 ; 3n + 1 rule, the sequence of numbers will:

  • in a finite number of steps become smaller than n (in which case we can check what we know about that smaller number).

  • will not cause an overflow in the steps.

(on which, as TonyK correctly notes, we cannot rely (even not on the first)).

So, for the optimistic scenario, we can use the following function:

#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>

using namespace std;

using found_set = unordered_set<size_t>;

bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
    size_t tries = 0;
    size_t n = i;
    while(n != 1) {
        if(++tries == max_tries ) 
            return false;

        if(n < i)
            return found.empty() || found.find(i) == found.end();
        if(n % 2 == 0)
            n /= 2;
        else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
            return false;
    }   

    return true;
}

Note the following:

  1. The function only attempts to verify the conjecture for the number it receives. If it returns true, it has been verified. If it returns false, it just means it hasn't been verified (i.e., it does not mean it has been disproved).

  2. It takes a parameter max_tries, and only verifies up to this number of steps. If the number has been exceeded, it does not try to discern whether this is part of an infinite loop or not - it just returns that verification failed.

  3. It takes an unordered_set of known numbers that have failed (of course, if the Collatz conjecture is true, this set will always be empty).

  4. It detects overflow via __builtin_*_overflow. (Unfortunately, this is gcc specific, then. A different platform might require a different set of such functions.)

Now for the slow-but-sure function. This function uses the GNU MP multi-precision arithmetic library. It checks for an infinite loop by maintaining the sequence of numbers it has already encountered. This function returns true if the conjecture has been proved for this number, and false if has been disproved for this number (note the difference from the previous fast verification).

bool slow_check(size_t i) {
    mpz_t n_; 
    mpz_init(n_);

    mpz_t rem_;
    mpz_init(rem_);

    mpz_t i_; 
    mpz_init(i_);

    mpz_set_ui(i_, i); 
    mpz_set_ui(n_, i); 

    list<mpz_t> seen;

    while(mpz_cmp_ui(n_, 1) != 0) {
        if(mpz_cmp_ui(n_, i) < 0)
            return true;
        mpz_mod_ui(rem_, n_, 2); 
        if(mpz_cmp_ui(rem_, 0) == 0) {
            mpz_div_ui(n_, n_, 2); 
        }   
        else {
            mpz_mul_ui(n_, n_, 3); 
            mpz_add_ui(n_, n_, 1); 
       }   
       seen.emplace_back(n_);
       for(const auto &e0: seen)
           for(const auto &e1: seen)
               if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
                   return false;
   }   

   return true;
}

Finally, main maintains an unordered_set of the disproven numbers. for each number, it optimistically tries to verify the conjecture, then, if it fails (for overflow or exceeding the number of iterations), uses the slow method:

int main()
{
    const auto max_num = 5000000000;
    found_set found;

    for(size_t i = 1; i <= max_num; i++) {
        if(i % 1000000000 == 0)
            cout << "iteration " << i << endl;

        auto const f = fast_verify(i, max_num, found);
        if(!f && !slow_check(i))
            found.insert(i);
    }

    for(auto e: found)
        cout << e << endl;
}

Running the full code (below) gives:

$ g++ -O3 --std=c++11 magic2.cpp -lgmp && time ./a.out
iteration 1000000000
iteration 2000000000
iteration 3000000000
iteration 4000000000
iteration 5000000000

real    1m3.933s
user    1m3.924s
sys 0m0.000s

$ uname -a
Linux atavory 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ sudo lshw | grep -i cpu
      *-cpu
           description: CPU
           product: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
           bus info: cpu@0
           version: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
           capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts cpufreq

I.e., no disproven numbers found, and the running time at ~64 seconds.


Full code:

#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>

using namespace std;

using found_set = unordered_set<size_t>;

bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
    size_t tries = 0;
    size_t n = i;
    while(n != 1) {
        if(++tries == max_tries )
            return false;

        if(n < i)
            return found.empty() || found.find(i) == found.end();
        if(n % 2 == 0)
            n /= 2;
        else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
            return false;
    }   

    return true;
}

bool slow_check(size_t i) {
    mpz_t n_; 
    mpz_init(n_);

    mpz_t rem_;
    mpz_init(rem_);

    mpz_t i_; 
    mpz_init(i_);

    mpz_set_ui(i_, i); 
    mpz_set_ui(n_, i); 

    list<mpz_t> seen;

    while(mpz_cmp_ui(n_, 1) != 0) {
        if(mpz_cmp_ui(n_, i) < 0)
            return true;
        mpz_mod_ui(rem_, n_, 2); 
        if(mpz_cmp_ui(rem_, 0) == 0) {
            mpz_div_ui(n_, n_, 2); 
        }   
        else {
            mpz_mul_ui(n_, n_, 3); 
            mpz_add_ui(n_, n_, 1); 
       }   
       seen.emplace_back(n_);
       for(const auto &e0: seen)
           for(const auto &e1: seen)
               if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
                   return false;
   }   

   return true;
}


int main()
{
    const auto max_num = 5000000000;
    found_set found;

    for(size_t i = 1; i <= max_num; i++) {
        if(i % 1000000000 == 0)
            cout << "iteration " << i << endl;

        auto const f = fast_verify(i, max_num, found);
        if(!f && !slow_check(i))
            found.insert(i);
    }

    for(auto e: found)
        cout << e << endl;

    return 0;
}
2785528
  • 5,438
  • 2
  • 18
  • 20
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • 2
    It's a little confusing having two `num`s. – Mark Setchell Sep 09 '16 at 20:34
  • @MarkSetchell You are absolutely correct - many thanks. Corrected. – Ami Tavory Sep 09 '16 at 20:35
  • yes, thank you. I changed my code and on the university linux machines it runs in ~31 seconds (I have no idea of the specs, I just ssh'd into it) – Chaz Sep 09 '16 at 20:43
  • Can `unsigned long` count past 4,294,967,295 on your system? – wally Sep 09 '16 at 20:51
  • @flatmouse Ooh, nice point! Changed everything to `size_t` (didn't affect the execution). – Ami Tavory Sep 09 '16 at 20:54
  • 2
    This does not, in fact, work. Consider what would happen if there _were_ a non-magic number below `max_num`: how would you detect this? You wouldn't, because either (i) `num` would overflow, probably to a magic number; or (ii) the program would enter an endless loop. I suppose you could say you had detected it in case (ii), but certainly not in case (i). – TonyK Sep 09 '16 at 21:09
  • 2
    As the while loop has no side effect, compiler may ignore that loop completely BTW. – Jarod42 Sep 09 '16 at 21:21
  • @Jarod42: so that's why it runs so fast! – TonyK Sep 09 '16 at 21:25
  • 1
    @Jarod42 Your comment was correct, but I then added side-effects, and that hardly affected the timing. In any case, I've subsequently completely rewritten the answer to address TonyK's correct comments using a first-optimstic-then-pessimistic approach. That increased the running time by ~20 seconds. I'd appreciate it if you could have another look at the answer. Thanks. – Ami Tavory Sep 22 '16 at 08:35
  • @DOUGLASO.MOEN Thanks. Could you please paste the full error? I don't see it. The warnings indeed appear by me with `-Wall` (and indeed, it's better to get rid of them, and I'll do that). – Ami Tavory Oct 05 '16 at 11:47
  • my options: g++ -m64 -O3 -ggdb -std=c++14 -Wall -Wextra -Wshadow -Wnon-virtual-dtor -pedantic -Wcast-align -Wcast-qual -Wconversion -Wpointer-arith -Wunused -Woverloaded-virtual -O0 dumy432.cc -o dumy432 -L../../bag -lbag_i686 -lgmpxx -lgmp – 2785528 Oct 05 '16 at 21:32
  • @DOUGLASO.MOEN Yes, you are absolutely right - thanks! (I must also add, holy crap! as you can see by the downvotes to the previous version, and by your comment, this answer turned out to plague me more than I thought when I first answered it). At the end of each of the two functions first functions, `return true;` needs to be added. By the C++ standard, main doesn't need a `return 0;` I rebuilt and ran with your correct suggestions (thanks again). – Ami Tavory Oct 05 '16 at 21:49
  • @DOUGLASO.MOEN No problem. If you see, I've replaced your helpful markers by the corrected code. Thanks again for your corrections. – Ami Tavory Oct 05 '16 at 21:56
  • @DOUGLASO.MOEN Added it at the bottom of the timing results. I've had bad experiences with Dells, FWIW. – Ami Tavory Oct 05 '16 at 22:08
2

Your code, and Ami Tavory's answer, completely ignore the issue of overflow. If there were a non-magic number in your range, then the Collatz iteration would either enter a loop, or increase without bound. If it increases without bound, it will certainly overflow, whatever the size of your unsigned long; and if it enters a loop, it might well overflow before it gets there. So you must put an overflow check in there:

#include <limits.h>
...
if (number > (ULONG_MAX - 1) / 3)
    PrintAnApologyAndDie() ;
number = (number * 3) + 1;
TonyK
  • 16,761
  • 4
  • 37
  • 72
0

what can I do to speed it up? Am I missing obvious optimization issues?

  • for(unsigned long i = 1; i <= 5000000000; i++)

A for-loop is not the fastest coding style to invoke a method 5B times.

In my code see unwind100() and innerLoop(). The unwind'ing I coded removed 99% of the innerLoop overhead, which saved more than 5 seconds (on my desktop DD5). Your savings may vary. Wikipedia has an article about unwinding a loop, with a brief description of the savings potential.


  • if (i % 1000000 == 0)

This code apparently generates progress reports. At a cost of 5 billion comparisons, it accomplishes nothing for the magic number checks.

See my code outerLoop, which calls innerLoop 10 times. This provides a convenient location for 1 progress report each outer loop. Consider splitting your loops into an 'innerLoop' and an 'outerLoop'. Testing 5 Billion times is more expensive than adding 10 calls to innerLoops.


  • As in the comments, your checkMagic() return's '1' regardless of test results. A simple error, but I do not know if this error affects performance.

I think your checkMagic() does not achieve tail-recursion, which really will slow your code down.

See my method checkMagicR(), which has tail recursion, and returns a true or false.

Also, in your CheckMagic(), in the clause for odd input, you ignore the idea that (3n+1) must be an even number when n is positive and odd. My code performs an 'early-div-by-2', thus reducing the future effort when possible.


  • In my methods unwind10(), note that my code exploits the known even/odd pattern of input the for the sequence (2..5 Billion), by using checkMagicRO() (for odd inputs) and checkMagicRE() (for even).

In my earlier versions, the code wasted time detecting that the known even input was even, then dividing it by 2, and then returning true. checkMagicRE() skips the test and divide, saving time.

CheckMagicRO() skips the test for odd, then performs the odd stuff and continues into checkMagicR().


  • if(number != 1) { checkMagic(number); }

Your recursion continues all the way down to 1 ... not necessary, and potentially the biggest hit to overall performance.

See my checkMagicR() "recursion termination clause". My code stops when ((n/2) < m_N), m_N being the test value that started this recursion. The idea is that all smaller test inputs are already proven magic, else the code would have asserted. It also halts when wraparound is imminent ... which has not happened.


Results:

My desktop (DD5) is older, slower, and running Ubuntu 15.10 (64). This code was developed with g++ v5.2.1, and only finishes without timeout assert using -O3.

DD5 is apparently 2x slower than the machine used by Amy Tavory (comparing how his code ran on DD5).

On DD5, my code completes in ~44 seconds in mode 1, and 22 seconds in mode 2.

A faster machine might complete this code in 11 seconds (in mode 2).


Full Code:

// V9

#include <chrono>
#include <iomanip>
#include <iostream>
#include <limits>
#include <sstream>
#include <thread>
#include <vector>
#include <cassert>

// chrono typedef - shorten one long name
typedef std::chrono::high_resolution_clock  Clock_t;


class T431_t
{

private:
   uint64_t           m_N;             // start-of-current-recursion
   const uint64_t     m_upperLimitN;   // wrap-around avoidance / prevention

   std::stringstream  m_ssMagic;       // dual threads use separate out streams

   uint64_t           m_MaxRcrsDpth;   // diag only
   uint64_t           m_Max_n;         // diag only
   uint64_t           m_TotalRecurse;  // diag only

   Clock_t::time_point  m_startMS;     // Derived req - progress info

   std::stringstream  m_ssDuration;

   std::ostream*      m_so;            // dual threads - std::cout or m_ssMagic

   uint64_t           m_blk;

   const int64_t      m_MaxThreadDuration; // single Max80KMs, dual Max40KMs


   // enum of known type (my preference for 'internal' constants)
   enum T431_Constraints : uint64_t
   {
      ZERO     = 0,
      B00      = 1,    // least-significant-bit is bit 0
      ONE      = 1,    // lsbit set
      TWO      = 2,
      THREE    = 3,
      //
      K        = 1000, // instead of 1024,
      //
      BlkSize  = ((K * K * K) / 2),  // 0.5 billion
      //
      MaxSingleCoreMS = (80 * K), // single thread max
      MaxDualCoreMS   = (40 * K)  // dual thread max
   };

   // convenience method:  show both hex and decimal on one line (see dtor)
   inline std::string showHexDec(const std::string& label, uint64_t val) {
      std::stringstream ss;
      ss << "\n" << label
         << " = 0x" << std::hex << std::setfill('0') << std::setw(16) << val
         << "  ("   << std::dec << std::setfill(' ') << std::setw(22) << val << ")";
      return (ss.str());
   }

   // duration: now() - m_startMS
   std::chrono::milliseconds  getChronoMillisecond() {
      return (std::chrono::duration_cast<std::chrono::milliseconds>
              (Clock_t::now() - m_startMS));
   }

   // reports milliseconds since m_startMS time
   void ssDurationPut (const std::string& label)
      {
         m_ssDuration.str(std::string());  // empty string
         m_ssDuration.clear();             // clear ss flags

         std::chrono::milliseconds  durationMS = getChronoMillisecond();

         uint64_t  durationSec = (durationMS.count() / 1000);
         uint64_t durationMSec = (durationMS.count() % 1000);

         m_ssDuration
            << label << std::dec << std::setfill(' ') << std::setw(3) << durationSec
            << "."   << std::dec << std::setfill('0') << std::setw(3) << durationMSec
            << " sec";
      }

   inline void reportProgress()
      {
         std::chrono::milliseconds  durationMS = getChronoMillisecond();

         assert(durationMS.count() < m_MaxThreadDuration); // normal finish -> "no endless loop"
         //
         uint64_t  durationSec = (durationMS.count() / 1000);
         uint64_t durationMSec = (durationMS.count() % 1000);

         *m_so << " m_blk = " << m_blk
               << "  m_N = 0x" << std::setfill('0') << std::hex << std::setw(16) << m_N
               << "  " << std::dec << std::setfill(' ') << std::setw(3) << durationSec
               << "."  << std::dec << std::setfill('0') << std::setw(3) << durationMSec
               << " sec  (" << std::dec << std::setfill(' ') << std::setw(10) << m_N << ")"
               << std::endl;
      }


public:

   T431_t(uint64_t maxDuration = MaxSingleCoreMS) :
      m_N                 (TWO),  // start val of current recursion
      m_upperLimitN       (initWrapAroundLimit()),
      // m_ssMagic        // default ctor ok - empty
      m_MaxRcrsDpth       (ZERO),
      m_Max_n             (TWO),
      m_TotalRecurse      (ZERO),
      m_startMS           (Clock_t::now()),
      // m_ssDuration     // default ctor ok - empty
      m_so                (&std::cout), // default
      m_blk               (ZERO),
      m_MaxThreadDuration (maxDuration)
      {
         m_ssMagic.str().reserve(1024*2);
         m_ssDuration.str().reserve(256);
      }

   ~T431_t()
      {
         // m_MaxThreadDuration
         // m_blk
         // m_so
         // m_ssDuration
         // m_startMS
         // m_Max_n
         // m_TotalRecurse
         // m_MaxRcrsDpth
         if (m_MaxRcrsDpth) // diag only
         {
            ssDurationPut ("\n      duration = " );

            std::cerr << "\n T431() dtor        "
               //
                      << showHexDec("         u64MAX",
                                    std::numeric_limits<uint64_t>::max()) << "\n"
               //
                      << showHexDec("        m_Max_n", m_Max_n)
                      << showHexDec("  (3*m_Max_n)+1", ((3*m_Max_n)+1))
                      << showHexDec("    upperLimitN", m_upperLimitN)
               //                        ^^^^^^^^^^^ no wrap-around possible when n is less
               //
                      << "\n"
                      << showHexDec("  m_MaxRcrsDpth",  m_MaxRcrsDpth)
                      << "\n"
                      << showHexDec(" m_TotalRecurse",  m_TotalRecurse)
               //
                      << m_ssDuration.str() << std::endl;
         }
         // m_ssMagic
         // m_upperLimitN
         // m_N
         if(m_MaxThreadDuration == MaxSingleCoreMS)
            std::cerr << "\n  " << __FILE__ << " by Douglas O. Moen     Compiled on = "
                      << __DATE__ << "  " << __TIME__ << "\n";
      }

private:

   inline bool checkMagicRE(uint64_t nE) // n is known even, and the recursion starting point
      {
         return(true);  // for unit test, comment out this line to run the asserts

         // unit test - enable both asserts
         assert(nE == m_N);             // confirm recursion start value
         assert(ZERO == (nE & B00));    // confirm even
         return(true);                  // nothing more to do
      }

   inline bool checkMagicRO(uint64_t nO) // n is known odd,  and the recursion starting point
      {
         // unit test - uncomment the following line to measure checkMagicRE() performance
         // return (true);  // when both methods do nothing -  0.044

         // unit test - enable both these asserts
         // assert(nO == m_N);  // confirm recursion start value
         // assert(nO & B00);   // confirm odd
         uint64_t nxt;
         {
            assert(nO < m_upperLimitN);  // assert that NO wrap-around occurs
            // NOTE: the sum of 4 odd uint's is even, and is destined to be
            // divided by 2 in the next recursive invocation.
            // we need not wait to divide by 2
            nxt = ((nO+nO+nO+ONE)  >> ONE); // combine the arithmetic
            //      ||||||||||||   ||||||
            //      sum is even    unknown after sum divided by two
         }
         return (checkMagicR(nxt));
      }


   inline void unwind8()
      {
         // skip 0, 1
         assert(checkMagicRE (m_N)); m_N++;  // 2
         assert(checkMagicRO (m_N)); m_N++;  // 3
         assert(checkMagicRE (m_N)); m_N++;  // 4
         assert(checkMagicRO (m_N)); m_N++;  // 5
         assert(checkMagicRE (m_N)); m_N++;  // 6
         assert(checkMagicRO (m_N)); m_N++;  // 7
         assert(checkMagicRE (m_N)); m_N++;  // 8
         assert(checkMagicRO (m_N)); m_N++;  // 9
      }

   inline void unwind10()
      {
         assert(checkMagicRE (m_N)); m_N++;  // 0
         assert(checkMagicRO (m_N)); m_N++;  // 1
         assert(checkMagicRE (m_N)); m_N++;  // 2
         assert(checkMagicRO (m_N)); m_N++;  // 3
         assert(checkMagicRE (m_N)); m_N++;  // 4
         assert(checkMagicRO (m_N)); m_N++;  // 5
         assert(checkMagicRE (m_N)); m_N++;  // 6
         assert(checkMagicRO (m_N)); m_N++;  // 7
         assert(checkMagicRE (m_N)); m_N++;  // 8
         assert(checkMagicRO (m_N)); m_N++;  // 9
      }

   inline void unwind98()
      {
         unwind8();
         unwind10();  // 1
         unwind10();  // 2
         unwind10();  // 3
         unwind10();  // 4
         unwind10();  // 5
         unwind10();  // 6
         unwind10();  // 7
         unwind10();  // 8
         unwind10();  // 9
      }

   inline void unwind100()
      {
         unwind10();  // 0
         unwind10();  // 1
         unwind10();  // 2
         unwind10();  // 3
         unwind10();  // 4
         unwind10();  // 5
         unwind10();  // 6
         unwind10();  // 7
         unwind10();  // 8
         unwind10();  // 9
      }

   inline void innerLoop (uint64_t start_N, uint64_t end_N)
      {
         for (uint64_t iLoop = start_N; iLoop < end_N; iLoop += 100)
         {
            unwind100();
         }

         reportProgress();
      }

   inline void outerLoop(const std::vector<uint64_t>&  start_Ns,
                         const std::vector<uint64_t>&  end_Ns)
      {
         *m_so << "\n outerLoop \n  m_upperLimitN = 0x"
               << std::hex << std::setfill('0') << std::setw(16) << m_upperLimitN
               << "\n            m_N = 0x"      << std::setw(16) << start_Ns[0]
               << "  " << std::dec << std::setfill(' ') << std::setw(3) << 0
               << "."  << std::dec << std::setfill('0') << std::setw(3) << 0
               << " sec  (" << std::dec << std::setfill(' ') << std::setw(10)
               << start_Ns[0] << ")" << std::endl;

         size_t szStart = start_Ns.size();
         size_t szEnd   = end_Ns.size();
         assert(szEnd == szStart);

         m_blk = 0;

         {  // when blk 0 starts at offset 2 (to skip values 0 and 1)
            if(TWO == start_Ns[0])
            {
               m_N = TWO;  // set m_N start

               unwind98(); // skip 0 and 1

               assert(100 == m_N); // confirm

               innerLoop (100, end_Ns[m_blk]);

               m_blk += 1;
            }
            else // thread 2 blk 0 starts in the middle of 5 billion range
            {
               m_N = start_Ns[m_blk];  // set m_N start, do full block
            }
         }

         for (; m_blk < szStart;  ++m_blk)
         {
            innerLoop (start_Ns[m_blk], end_Ns[m_blk]);
         }
      }


   // 1 == argc
   void  exec1 (void) // no parameters --> check all values
      {
         const std::vector<uint64_t> start_Ns =
            {         TWO, (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4),
              (BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9)  };
         //    blk 0        blk 1        blk 2        blk 3        blk 4
         //    blk 5        blk 6        blk 7        blk 8        blk 9
         const std::vector<uint64_t> end_Ns =
            { (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5),
              (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };

         m_startMS = Clock_t::now();

         // outerLoop :  10 blocks generates 10 progressReports()
         outerLoop( start_Ns, end_Ns);

         ssDurationPut("");

         std::cout << "   execDuration = " << m_ssDuration.str() << " " << std::endl;
      } // void  exec1 (void)


   void exec2a ()  //  thread entry a
      {
         //lower half of test range
         const std::vector<uint64_t> start_Ns =
            {        TWO , (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4) };
         //    blk 0        blk 1        blk 2        blk 3        blk 4
         const std::vector<uint64_t> end_Ns =
            { (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5) };

         m_startMS = Clock_t::now();

         // m_so to std::cout
         // uses 5 blocks to gen 5 progress indicators
         outerLoop (start_Ns, end_Ns);

         ssDurationPut("");

         std::cout << "   execDuration =  " << m_ssDuration.str() << " (a)   " << std::endl;
      } // exec2a (void)


   void exec2b ()  //  thread entry b
      {
         // upper half of test range
         const std::vector<uint64_t> start_Ns =
            { (BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9)  };
         //    blk 5        blk 6        blk 7        blk 8        blk 9
         const std::vector<uint64_t> end_Ns =
            { (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };

         m_startMS = Clock_t::now();

         m_so = &m_ssMagic;
         // uses 5 blocks to gen 5 progress indicators
         outerLoop(start_Ns, end_Ns);

         ssDurationPut("");

         m_ssMagic << "   execDuration =  " << m_ssDuration.str() << " (b)  " << std::endl;
      } // exec2b (void)


   // 3 == argc
   int exec3 (std::string argv1,  // recursion start value
              std::string argv2)  // values count
      {
         uint64_t start_N = std::stoull(argv1);
         uint64_t length  = std::stoull(argv2);

         // range checks
         {
            std::string note1;
            switch (start_N) // limit check
            {
            case 0: { start_N = 3; length -= 3;
                  note1 = "... 0 is below test range (change start to 3 ";
            } break;

            case 1: { start_N = 3; length -= 2;
                  note1 = "... 1 is defined magic (change start to 3";
            } break;

            case 2: { start_N = 3; length -= 1;
                  note1 = "... 2 is trivially magic (change start to 3";
            } break;

            default:
            {  // when start_N from user is even
               if (ZERO == (start_N & B00))
               {
                  start_N -= 1;
                  note1 = "... even is not an interesting starting point";
                  length  += 1;
               }
            }
            break;

            } // switch (start_N)

            // start_N not restrained ... allow any manual search
            //    but uin64_t wrap-around still preempted

            std::cout << "\n start_N: " << start_N << "  " << note1
                      << "\n  length: " << length  << "  " << std::endl;
         }

         m_startMS = Clock_t::now();

         for (m_N = start_N;   m_N < (start_N + length);   ++m_N)
         {
            bool magicNumberFound = checkMagicRD (m_N, 1);

            std::cout << m_ssMagic.str() << std::dec << m_N
                      << "    m_MaxRcrsDpth: "  << m_MaxRcrsDpth << ";"
                      << "    m_Max_n: "  << m_Max_n << ";" << std::endl;

            m_ssMagic.str(std::string()); // empty string
            m_ssMagic.clear();            // clear flags)
            if (!magicNumberFound)
            {
               std::cerr << m_N << " not a magic number!" << std::endl;
               break;
            }

         } // for (uint64_t k = kStart; k < kEnd; ++k)

         ssDurationPut("");

         std::cout << "   execDuration =  " << m_ssDuration.str() << "    " << std::endl;

         return(0);
      } // int exec3 (std::string argv1,  std::string argv2)


   // conversion
   std::string ssMagicShow() { return (m_ssMagic.str()); }


   // D3: use recursion for closer match to definition
   bool checkMagicR(uint64_t n)   // recursive Performance
      {
         uint64_t nxt;
         {
            if (n & B00) // n-odd
            {
               if (n >= m_upperLimitN) // wraparound imminent abort
                  return (false);    // recursion termination clause

               // NOTE: the sum of 4 odd uint's is even, and will be halved
               // we need not wait for the next recursion to divide-by-2
               nxt = ((n+n+n+ONE)  >> ONE); // combine the arithmetic
               //     known even
            }
            else // n-even
            {
               nxt = (n >> ONE);   // bit shift divide by 2

               if (nxt < m_N)      // nxt < m_N start of recursion
                  return ( true ); // recursion termination clause
               // We need not de-curse to 1 because all
               // (n < m_N) are asserted as magic
            }
         }
         return (checkMagicR(nxt));  // tail recursion
      } // bool  checkMagicR(uint64_t n)


   // D4: diagnostic text output
   bool checkMagicRD (uint64_t n, uint64_t rd)  // rd: recursion depth
      {
         if (n > m_Max_n) m_Max_n = n;
         m_TotalRecurse += 1;

         m_ssMagic << "\n" << rd << std::setw(static_cast<int>(rd)) << " "
                   << " checkMagicRD (" << m_N << ", " << n << ")";

         uint64_t nxt;
         {
            if (n & B00) // n-odd
            {
               if (n >= m_upperLimitN) // wraparound imminent abort
                  return ( terminateCheckMagic (nxt, rd, false) );

               // NOTE: the sum of 4 odd uint's is even, and will be divided by 2
               // we need not wait for the next recursion to divide by 2
               nxt = ((n+n+n+ONE)  >> ONE); // combine the arithmetic
               //      |||||||||   ||||||
               //    sum is even   unknown after divide by two
            }
            else // n-even
            {
               nxt = (n >> ONE);     // bit shift divide by 2

               if (nxt < m_N)      // nxt < m_N start of recursion
                  return ( terminateCheckMagic (nxt, rd, true) );
               // We need not de-curse to 1 because all
               // (n < m_N) are asserted as magic
            }
         }
         return (checkMagicRD (nxt, (rd+1))); // tail recursion
      } //  bool checkMagicRD(uint64_t, uint64_t rd)


   bool terminateCheckMagic (uint64_t n, uint64_t rd, bool rslt)
      {
         if (rd > m_MaxRcrsDpth) // diag only
         {
            std::chrono::milliseconds  durationMS = getChronoMillisecond();

            uint64_t durationSec  = durationMS.count() / 1000;
            uint64_t durationMSec = durationMS.count() % 1000;

            m_MaxRcrsDpth = rd;
            // generate noise each update - this does not happen often
            static uint64_t count = 0;
            std::cerr << "\n\n" << std::setfill(' ')
                      << std::setw(30) << "["
                      << std::setw(4) << durationSec << "." << std::setfill('0') << std::setw(3)
                      << durationMSec << std::setfill(' ')
                      << " sec]   m_N: " << std::setw(10) << m_N
                      << "    n: " << std::setw(10) << n
                      << "    MaxRcrsDpth: " << std::setw(5) << m_MaxRcrsDpth
                      << "    Max_n:  " << std::setw(16) << m_Max_n
                      << "   (" << count << ")" << std::flush;
            count += 1; // how many times MaxRcrsDpth updated
         }

         m_ssMagic << "\n " << std::setw(3) << rd << std::setw(static_cast<int>(rd)) << " "
                   << "  " << (rslt ? "True " : ">>>false<<< ") << m_N << ", ";

         return (rslt);
      }

   uint64_t initWrapAroundLimit()
      {
         uint64_t upLimit = 0;
         uint64_t u64MAX = std::numeric_limits<uint64_t>::max();
         // there exist a max number,  m_upperLimitN
         // where 3n+1 < u64MAX, i.e.
         upLimit = ((u64MAX - 1) / 3);

         return (upLimit);
      } // uint64_t initWrapAroundLimit()

public:

   int exec (int argc, char* argv[])
      {
         int retVal = -1;

         { time_t t0 = std::time(nullptr); while(t0 == time(nullptr)) { /* do nothing*/ }; }
         // result: consistent run time

         switch (argc)
         {

         case 1: // no parameters: 5 billion tests, 10 blocks, this instance, < 80 sec
         {
            exec1();
            retVal = 0;
         }
         break;

         case 2: // one parameter: 2 threads each runs 5/2 billion tests, in 5 blocks,
         {  //                     two new instances,  < 40 sec
            // 2 new objects
            T431_t  t431a(MaxDualCoreMS); // lower test sequence
            T431_t  t431b(MaxDualCoreMS); // upper test sequence

            // 2 threads
            std::thread tA (&T431_t::exec2a, &t431a);
            std::thread tB (&T431_t::exec2b, &t431b);

            // 2 join's
            tA.join();
            tB.join();

            // lower test sequence (tA) output directly to std::cout
            // upper test sequence (tB) captured in T431_t::m_ssMagic.  send it now
            std::cout << t431b.ssMagicShow() << std::endl;

            retVal = 0;
         }
         break;


         case 3: // argv[1] -> start address,  argv[2] -> length,
         {
            retVal = exec3 (std::string(argv[1]), std::string(argv[2]));
         } break;


         default :
         {
            std::cerr
               << "\nUsage: "
               << "\n argc (= " << argc << ") not expected, should be 0, 1, or 2 parameters."
               << "\n   1  (no parameters) : 5 billion tests, 10 blocks, < 80 sec\n"
               //
               << "\n   2  (one parameter) : 2 threads, each 5/2 billion tests, 5 blocks,  < 40 sec\n"
               //
               << "\n   3  (two parameters): verbose mode: start argv[1], test count argv[2] \n"
               //
               << "\n   3.1: \"./dumy431V4 3 1000  1> /tmp/dumy431V4.out2   2> /tmp/dumy431V4.out2a  "
               << "\n                     3  1 K    std::cout redirected     std::cerr redirected \n"
               //
               << "\n   3.2: \"./dumy431V4 3 1000000000  1>  /dev/null      2> /tmp/dumy431V4.out2b   "
               << "\n                     3  1 B       discard std::cout  std::cerr redirected \n"
               //
               << "\n   3.3: \"./dumy431V4 15 100   "
               << "\n                     15 100   (cout and cerr to console) \n"
               << std::endl;
            retVal = 0;
         }

         } // switch

         return(retVal);
      } // int exec(int argc, char* argv[])

};  // class T431


int main (int argc, char* argv[])
{
   std::cout << "argc: " << argc << std::endl;
   for (int i=0; i<argc; i+=1) std::cout << argv[i] << " ";
   std::cout << std::endl;

   setlocale(LC_ALL, "");
   std::ios::sync_with_stdio(false);

   int retVal;
   {
      T431_t  t431;
      retVal = t431.exec(argc, argv);
   }

   std::cout << "\nFINI " << std::endl;
   return(retVal);
}

The code submitted:

a) uses assert(x) for all checks (because assert is quick and has side-effects that the optimizer can not ignore).

b) detects any endless loop by using an assert of the duration.

c) asserts to prevent any wrap-around.

When any assert occurs, the program does not terminate normally.

When this code finishes normally, it means:

 a) no disproven numbers found, and
 b) the running time < 80 seconds, so no endless loop occurred      and
 c) no wrap-around occurred

Notes about Recursive checkMagicR(uint64_t n):

3 forms of n are available in the recursion:

  • formal parameter n - typical of recursion calls

  • auto var nxt - receives the computed value of n for next recursive call

  • data attribute: m_N - the starting point of the currently running recursion effort

2785528
  • 5,438
  • 2
  • 18
  • 20
  • Tail Recursion demonstration: when built with -O0, the code takes longer than 80 seconds and asserts. The progress reports suggest ~18 seconds (with -O0) vs ~4.3 seconds (with -O3). – 2785528 Dec 18 '16 at 13:11
  • FYI: I use make. The compile command it generates is: g++-5 -m64 -O3 -ggdb -std=c++14 -Wall -Wextra -Wshadow -Wnon-virtual-dtor -pedantic -Wcast-align -Wcast-qual -Wconversion -Wpointer-arith -Wunused -Woverloaded-virtual -O3 dumy431V9.cc -o dumy431V9 -pthread. The code also compiles and runs (full speed) using -std=C++11. – 2785528 Dec 18 '16 at 13:16
  • I consider the 'average' (44 / 5B) as distressingly small ... I think it is possible that the optimizer has still surprised me. (ideas?) – 2785528 Dec 18 '16 at 15:09
  • Trivia 1: the Max Recursion Depth (of submitted code) is 447 calls, at test input 2,788,008,987 ... surprisingly close to the middle of the 5,000,000,000 test range. Max_n in that recursion is 3,562,942,561,397,226,080. – 2785528 Dec 20 '16 at 03:16
0

I was speculating about what changes the optimizer might have made for this particular tail recursion.

So I created an iterative answer from my V9 code (in my other answer).

Change From:

bool checkMagicR(uint64_t n) 
   {
      //.. replace all 
   }

Change To:

// ITERATIVE VERSION 
bool checkMagicR(uint64_t n)   // ITERATIVE Performance
   {
      do
      {  uint64_t nxt;

         if (n & B00) // n-odd
         {
            if (n >= m_upperLimitN) // wraparound imminent abort
               return (false);    // recursion termination clause.

            // NOTE: the sum of 4 odd uint's is even, and will be halved
            // we need not wait for the next recursion to divide-by-2
            nxt = ((n+n+n+ONE)  >> ONE); // combine the arithmetic
            //     known even
         }
         else // n-even
         {
            nxt = (n >> ONE);   // bit shift divide by 2

            if (nxt < m_N)      // nxt < m_N start of recursion
               return ( true );     // recursion termination clause.
            // We need not de-curse to 1 because all
            // (n < m_N) are asserted as magic
         }

         n = nxt;
      } while(true);  // NO  recursion
   }

Did not fix the name. Did not fix comments or other mentioning of 'recursion' in this method or others. I changed some comments at top and bottom of this method.

Results:

  • Performance is identical. (44 seconds)

  • Mode 1 and mode 2 are both iterative.

  • Mode 3 is (still) recursive.

2785528
  • 5,438
  • 2
  • 18
  • 20