what can I do to speed it up? Am I missing obvious optimization issues?
- for(unsigned long i = 1; i <= 5000000000; i++)
A for-loop is not the fastest coding style to invoke a method 5B times.
In my code see unwind100() and innerLoop(). The unwind'ing I coded removed 99% of the innerLoop overhead, which saved more than 5 seconds (on my desktop DD5). Your savings may vary. Wikipedia has an article about unwinding a loop, with a brief description of the savings potential.
This code apparently generates progress reports. At a cost of 5 billion
comparisons, it accomplishes nothing for the magic number checks.
See my code outerLoop, which calls innerLoop 10 times. This provides a
convenient location for 1 progress report each outer loop. Consider splitting
your loops into an 'innerLoop' and an 'outerLoop'. Testing 5 Billion times is
more expensive than adding 10 calls to innerLoops.
- As in the comments, your checkMagic() return's '1' regardless of test results. A simple error, but I do not know if this error affects performance.
I think your checkMagic() does not achieve tail-recursion, which really will slow your code down.
See my method checkMagicR(), which has tail recursion, and returns a true or
false.
Also, in your CheckMagic(), in the clause for odd input, you ignore the idea that (3n+1) must be an even number when n is positive and odd. My code performs an 'early-div-by-2', thus reducing the future effort when possible.
- In my methods unwind10(), note that my code exploits the known even/odd pattern of input the for the sequence (2..5 Billion), by using checkMagicRO() (for odd inputs) and checkMagicRE() (for even).
In my earlier versions, the code wasted time detecting that the known even input was even, then dividing it by 2, and then returning true. checkMagicRE() skips the test and divide, saving time.
CheckMagicRO() skips the test for odd, then performs the odd stuff and continues into checkMagicR().
- if(number != 1) { checkMagic(number); }
Your recursion continues all the way down to 1 ... not necessary, and potentially the biggest hit to overall performance.
See my checkMagicR() "recursion termination clause". My code stops when ((n/2) < m_N), m_N being the test value that started this recursion. The idea is that all smaller test inputs are already proven magic, else the code would have asserted. It also halts when wraparound is imminent ... which has not happened.
Results:
My desktop (DD5) is older, slower, and running Ubuntu 15.10 (64). This code was developed with g++ v5.2.1, and only finishes without timeout assert using -O3.
DD5 is apparently 2x slower than the machine used by Amy Tavory (comparing how his code ran on DD5).
On DD5, my code completes in ~44 seconds in mode 1, and 22 seconds in mode 2.
A faster machine might complete this code in 11 seconds (in mode 2).
Full Code:
// V9
#include <chrono>
#include <iomanip>
#include <iostream>
#include <limits>
#include <sstream>
#include <thread>
#include <vector>
#include <cassert>
// chrono typedef - shorten one long name
typedef std::chrono::high_resolution_clock Clock_t;
class T431_t
{
private:
uint64_t m_N; // start-of-current-recursion
const uint64_t m_upperLimitN; // wrap-around avoidance / prevention
std::stringstream m_ssMagic; // dual threads use separate out streams
uint64_t m_MaxRcrsDpth; // diag only
uint64_t m_Max_n; // diag only
uint64_t m_TotalRecurse; // diag only
Clock_t::time_point m_startMS; // Derived req - progress info
std::stringstream m_ssDuration;
std::ostream* m_so; // dual threads - std::cout or m_ssMagic
uint64_t m_blk;
const int64_t m_MaxThreadDuration; // single Max80KMs, dual Max40KMs
// enum of known type (my preference for 'internal' constants)
enum T431_Constraints : uint64_t
{
ZERO = 0,
B00 = 1, // least-significant-bit is bit 0
ONE = 1, // lsbit set
TWO = 2,
THREE = 3,
//
K = 1000, // instead of 1024,
//
BlkSize = ((K * K * K) / 2), // 0.5 billion
//
MaxSingleCoreMS = (80 * K), // single thread max
MaxDualCoreMS = (40 * K) // dual thread max
};
// convenience method: show both hex and decimal on one line (see dtor)
inline std::string showHexDec(const std::string& label, uint64_t val) {
std::stringstream ss;
ss << "\n" << label
<< " = 0x" << std::hex << std::setfill('0') << std::setw(16) << val
<< " (" << std::dec << std::setfill(' ') << std::setw(22) << val << ")";
return (ss.str());
}
// duration: now() - m_startMS
std::chrono::milliseconds getChronoMillisecond() {
return (std::chrono::duration_cast<std::chrono::milliseconds>
(Clock_t::now() - m_startMS));
}
// reports milliseconds since m_startMS time
void ssDurationPut (const std::string& label)
{
m_ssDuration.str(std::string()); // empty string
m_ssDuration.clear(); // clear ss flags
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
m_ssDuration
<< label << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec";
}
inline void reportProgress()
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
assert(durationMS.count() < m_MaxThreadDuration); // normal finish -> "no endless loop"
//
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
*m_so << " m_blk = " << m_blk
<< " m_N = 0x" << std::setfill('0') << std::hex << std::setw(16) << m_N
<< " " << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10) << m_N << ")"
<< std::endl;
}
public:
T431_t(uint64_t maxDuration = MaxSingleCoreMS) :
m_N (TWO), // start val of current recursion
m_upperLimitN (initWrapAroundLimit()),
// m_ssMagic // default ctor ok - empty
m_MaxRcrsDpth (ZERO),
m_Max_n (TWO),
m_TotalRecurse (ZERO),
m_startMS (Clock_t::now()),
// m_ssDuration // default ctor ok - empty
m_so (&std::cout), // default
m_blk (ZERO),
m_MaxThreadDuration (maxDuration)
{
m_ssMagic.str().reserve(1024*2);
m_ssDuration.str().reserve(256);
}
~T431_t()
{
// m_MaxThreadDuration
// m_blk
// m_so
// m_ssDuration
// m_startMS
// m_Max_n
// m_TotalRecurse
// m_MaxRcrsDpth
if (m_MaxRcrsDpth) // diag only
{
ssDurationPut ("\n duration = " );
std::cerr << "\n T431() dtor "
//
<< showHexDec(" u64MAX",
std::numeric_limits<uint64_t>::max()) << "\n"
//
<< showHexDec(" m_Max_n", m_Max_n)
<< showHexDec(" (3*m_Max_n)+1", ((3*m_Max_n)+1))
<< showHexDec(" upperLimitN", m_upperLimitN)
// ^^^^^^^^^^^ no wrap-around possible when n is less
//
<< "\n"
<< showHexDec(" m_MaxRcrsDpth", m_MaxRcrsDpth)
<< "\n"
<< showHexDec(" m_TotalRecurse", m_TotalRecurse)
//
<< m_ssDuration.str() << std::endl;
}
// m_ssMagic
// m_upperLimitN
// m_N
if(m_MaxThreadDuration == MaxSingleCoreMS)
std::cerr << "\n " << __FILE__ << " by Douglas O. Moen Compiled on = "
<< __DATE__ << " " << __TIME__ << "\n";
}
private:
inline bool checkMagicRE(uint64_t nE) // n is known even, and the recursion starting point
{
return(true); // for unit test, comment out this line to run the asserts
// unit test - enable both asserts
assert(nE == m_N); // confirm recursion start value
assert(ZERO == (nE & B00)); // confirm even
return(true); // nothing more to do
}
inline bool checkMagicRO(uint64_t nO) // n is known odd, and the recursion starting point
{
// unit test - uncomment the following line to measure checkMagicRE() performance
// return (true); // when both methods do nothing - 0.044
// unit test - enable both these asserts
// assert(nO == m_N); // confirm recursion start value
// assert(nO & B00); // confirm odd
uint64_t nxt;
{
assert(nO < m_upperLimitN); // assert that NO wrap-around occurs
// NOTE: the sum of 4 odd uint's is even, and is destined to be
// divided by 2 in the next recursive invocation.
// we need not wait to divide by 2
nxt = ((nO+nO+nO+ONE) >> ONE); // combine the arithmetic
// |||||||||||| ||||||
// sum is even unknown after sum divided by two
}
return (checkMagicR(nxt));
}
inline void unwind8()
{
// skip 0, 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind10()
{
assert(checkMagicRE (m_N)); m_N++; // 0
assert(checkMagicRO (m_N)); m_N++; // 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind98()
{
unwind8();
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void unwind100()
{
unwind10(); // 0
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void innerLoop (uint64_t start_N, uint64_t end_N)
{
for (uint64_t iLoop = start_N; iLoop < end_N; iLoop += 100)
{
unwind100();
}
reportProgress();
}
inline void outerLoop(const std::vector<uint64_t>& start_Ns,
const std::vector<uint64_t>& end_Ns)
{
*m_so << "\n outerLoop \n m_upperLimitN = 0x"
<< std::hex << std::setfill('0') << std::setw(16) << m_upperLimitN
<< "\n m_N = 0x" << std::setw(16) << start_Ns[0]
<< " " << std::dec << std::setfill(' ') << std::setw(3) << 0
<< "." << std::dec << std::setfill('0') << std::setw(3) << 0
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10)
<< start_Ns[0] << ")" << std::endl;
size_t szStart = start_Ns.size();
size_t szEnd = end_Ns.size();
assert(szEnd == szStart);
m_blk = 0;
{ // when blk 0 starts at offset 2 (to skip values 0 and 1)
if(TWO == start_Ns[0])
{
m_N = TWO; // set m_N start
unwind98(); // skip 0 and 1
assert(100 == m_N); // confirm
innerLoop (100, end_Ns[m_blk]);
m_blk += 1;
}
else // thread 2 blk 0 starts in the middle of 5 billion range
{
m_N = start_Ns[m_blk]; // set m_N start, do full block
}
}
for (; m_blk < szStart; ++m_blk)
{
innerLoop (start_Ns[m_blk], end_Ns[m_blk]);
}
}
// 1 == argc
void exec1 (void) // no parameters --> check all values
{
const std::vector<uint64_t> start_Ns =
{ TWO, (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4),
(BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 0 blk 1 blk 2 blk 3 blk 4
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5),
(BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
// outerLoop : 10 blocks generates 10 progressReports()
outerLoop( start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
} // void exec1 (void)
void exec2a () // thread entry a
{
//lower half of test range
const std::vector<uint64_t> start_Ns =
{ TWO , (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4) };
// blk 0 blk 1 blk 2 blk 3 blk 4
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5) };
m_startMS = Clock_t::now();
// m_so to std::cout
// uses 5 blocks to gen 5 progress indicators
outerLoop (start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " (a) " << std::endl;
} // exec2a (void)
void exec2b () // thread entry b
{
// upper half of test range
const std::vector<uint64_t> start_Ns =
{ (BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
m_so = &m_ssMagic;
// uses 5 blocks to gen 5 progress indicators
outerLoop(start_Ns, end_Ns);
ssDurationPut("");
m_ssMagic << " execDuration = " << m_ssDuration.str() << " (b) " << std::endl;
} // exec2b (void)
// 3 == argc
int exec3 (std::string argv1, // recursion start value
std::string argv2) // values count
{
uint64_t start_N = std::stoull(argv1);
uint64_t length = std::stoull(argv2);
// range checks
{
std::string note1;
switch (start_N) // limit check
{
case 0: { start_N = 3; length -= 3;
note1 = "... 0 is below test range (change start to 3 ";
} break;
case 1: { start_N = 3; length -= 2;
note1 = "... 1 is defined magic (change start to 3";
} break;
case 2: { start_N = 3; length -= 1;
note1 = "... 2 is trivially magic (change start to 3";
} break;
default:
{ // when start_N from user is even
if (ZERO == (start_N & B00))
{
start_N -= 1;
note1 = "... even is not an interesting starting point";
length += 1;
}
}
break;
} // switch (start_N)
// start_N not restrained ... allow any manual search
// but uin64_t wrap-around still preempted
std::cout << "\n start_N: " << start_N << " " << note1
<< "\n length: " << length << " " << std::endl;
}
m_startMS = Clock_t::now();
for (m_N = start_N; m_N < (start_N + length); ++m_N)
{
bool magicNumberFound = checkMagicRD (m_N, 1);
std::cout << m_ssMagic.str() << std::dec << m_N
<< " m_MaxRcrsDpth: " << m_MaxRcrsDpth << ";"
<< " m_Max_n: " << m_Max_n << ";" << std::endl;
m_ssMagic.str(std::string()); // empty string
m_ssMagic.clear(); // clear flags)
if (!magicNumberFound)
{
std::cerr << m_N << " not a magic number!" << std::endl;
break;
}
} // for (uint64_t k = kStart; k < kEnd; ++k)
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
return(0);
} // int exec3 (std::string argv1, std::string argv2)
// conversion
std::string ssMagicShow() { return (m_ssMagic.str()); }
// D3: use recursion for closer match to definition
bool checkMagicR(uint64_t n) // recursive Performance
{
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return (false); // recursion termination clause
// NOTE: the sum of 4 odd uint's is even, and will be halved
// we need not wait for the next recursion to divide-by-2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// known even
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( true ); // recursion termination clause
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicR(nxt)); // tail recursion
} // bool checkMagicR(uint64_t n)
// D4: diagnostic text output
bool checkMagicRD (uint64_t n, uint64_t rd) // rd: recursion depth
{
if (n > m_Max_n) m_Max_n = n;
m_TotalRecurse += 1;
m_ssMagic << "\n" << rd << std::setw(static_cast<int>(rd)) << " "
<< " checkMagicRD (" << m_N << ", " << n << ")";
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return ( terminateCheckMagic (nxt, rd, false) );
// NOTE: the sum of 4 odd uint's is even, and will be divided by 2
// we need not wait for the next recursion to divide by 2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// ||||||||| ||||||
// sum is even unknown after divide by two
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( terminateCheckMagic (nxt, rd, true) );
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicRD (nxt, (rd+1))); // tail recursion
} // bool checkMagicRD(uint64_t, uint64_t rd)
bool terminateCheckMagic (uint64_t n, uint64_t rd, bool rslt)
{
if (rd > m_MaxRcrsDpth) // diag only
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = durationMS.count() / 1000;
uint64_t durationMSec = durationMS.count() % 1000;
m_MaxRcrsDpth = rd;
// generate noise each update - this does not happen often
static uint64_t count = 0;
std::cerr << "\n\n" << std::setfill(' ')
<< std::setw(30) << "["
<< std::setw(4) << durationSec << "." << std::setfill('0') << std::setw(3)
<< durationMSec << std::setfill(' ')
<< " sec] m_N: " << std::setw(10) << m_N
<< " n: " << std::setw(10) << n
<< " MaxRcrsDpth: " << std::setw(5) << m_MaxRcrsDpth
<< " Max_n: " << std::setw(16) << m_Max_n
<< " (" << count << ")" << std::flush;
count += 1; // how many times MaxRcrsDpth updated
}
m_ssMagic << "\n " << std::setw(3) << rd << std::setw(static_cast<int>(rd)) << " "
<< " " << (rslt ? "True " : ">>>false<<< ") << m_N << ", ";
return (rslt);
}
uint64_t initWrapAroundLimit()
{
uint64_t upLimit = 0;
uint64_t u64MAX = std::numeric_limits<uint64_t>::max();
// there exist a max number, m_upperLimitN
// where 3n+1 < u64MAX, i.e.
upLimit = ((u64MAX - 1) / 3);
return (upLimit);
} // uint64_t initWrapAroundLimit()
public:
int exec (int argc, char* argv[])
{
int retVal = -1;
{ time_t t0 = std::time(nullptr); while(t0 == time(nullptr)) { /* do nothing*/ }; }
// result: consistent run time
switch (argc)
{
case 1: // no parameters: 5 billion tests, 10 blocks, this instance, < 80 sec
{
exec1();
retVal = 0;
}
break;
case 2: // one parameter: 2 threads each runs 5/2 billion tests, in 5 blocks,
{ // two new instances, < 40 sec
// 2 new objects
T431_t t431a(MaxDualCoreMS); // lower test sequence
T431_t t431b(MaxDualCoreMS); // upper test sequence
// 2 threads
std::thread tA (&T431_t::exec2a, &t431a);
std::thread tB (&T431_t::exec2b, &t431b);
// 2 join's
tA.join();
tB.join();
// lower test sequence (tA) output directly to std::cout
// upper test sequence (tB) captured in T431_t::m_ssMagic. send it now
std::cout << t431b.ssMagicShow() << std::endl;
retVal = 0;
}
break;
case 3: // argv[1] -> start address, argv[2] -> length,
{
retVal = exec3 (std::string(argv[1]), std::string(argv[2]));
} break;
default :
{
std::cerr
<< "\nUsage: "
<< "\n argc (= " << argc << ") not expected, should be 0, 1, or 2 parameters."
<< "\n 1 (no parameters) : 5 billion tests, 10 blocks, < 80 sec\n"
//
<< "\n 2 (one parameter) : 2 threads, each 5/2 billion tests, 5 blocks, < 40 sec\n"
//
<< "\n 3 (two parameters): verbose mode: start argv[1], test count argv[2] \n"
//
<< "\n 3.1: \"./dumy431V4 3 1000 1> /tmp/dumy431V4.out2 2> /tmp/dumy431V4.out2a "
<< "\n 3 1 K std::cout redirected std::cerr redirected \n"
//
<< "\n 3.2: \"./dumy431V4 3 1000000000 1> /dev/null 2> /tmp/dumy431V4.out2b "
<< "\n 3 1 B discard std::cout std::cerr redirected \n"
//
<< "\n 3.3: \"./dumy431V4 15 100 "
<< "\n 15 100 (cout and cerr to console) \n"
<< std::endl;
retVal = 0;
}
} // switch
return(retVal);
} // int exec(int argc, char* argv[])
}; // class T431
int main (int argc, char* argv[])
{
std::cout << "argc: " << argc << std::endl;
for (int i=0; i<argc; i+=1) std::cout << argv[i] << " ";
std::cout << std::endl;
setlocale(LC_ALL, "");
std::ios::sync_with_stdio(false);
int retVal;
{
T431_t t431;
retVal = t431.exec(argc, argv);
}
std::cout << "\nFINI " << std::endl;
return(retVal);
}
The code submitted:
a) uses assert(x) for all checks (because assert is quick and has
side-effects that the optimizer can not ignore).
b) detects any endless loop by using an assert of the duration.
c) asserts to prevent any wrap-around.
When any assert occurs, the program does not terminate normally.
When this code finishes normally, it means:
a) no disproven numbers found, and
b) the running time < 80 seconds, so no endless loop occurred and
c) no wrap-around occurred
Notes about Recursive checkMagicR(uint64_t n):
3 forms of n are available in the recursion:
formal parameter n - typical of recursion calls
auto var nxt - receives the computed value of n for next
recursive call
data attribute: m_N - the starting point of the currently running
recursion effort