2

Playing around with the boost multiprecision library. Calculating some big factorial numbers and such.

Problem is the output takes too long. 100,000! takes 0.5 seconds to calculate and 11 seconds to print. 1,000,000! takes half an hour (yes the output only).

Using cout for the output with > to put it to a file: ./prg > file

Tried putting it to a string first, stringstream, normal cout. Everything the same.

The convertion to a string just takes very long. Is there any way to speed it up?

Code example:

#include <iostream>
#include <chrono>
#include <string>
#include <boost/multiprecision/cpp_int.hpp>

int main() {
    uint32_t num = 100000;
    boost::multiprecision::cpp_int result = 1;

    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();

    for (uint32_t i = 2; i <= num; i++) {
        result *= i;
    }

    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
    std::cout << "calculation: " << std::chrono::duration_cast<std::chrono::milliseconds> (end - begin).count() / 1000.0 << " sec" << std::endl;

    std::string s = result.str();

    std::chrono::steady_clock::time_point endOutput = std::chrono::steady_clock::now();
    std::cout << "toString: " << std::chrono::duration_cast<std::chrono::milliseconds> (endOutput - end).count() / 1000.0 << " sec" << std::endl;

    std::cout << "length: " << s.length() << std::endl;

    return 0;
}

output:

calculation: 1.014 sec
toString: 7.643 sec
length: 456574

output same code in java using BigInteger:

calculation: 2.646 sec
toString: 0.466 sec
length: 456574
Richard
  • 43
  • 4
  • Can you post a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example), and some system info, so we can check exactly what's going on? Which [numerical back-end](https://www.boost.org/doc/libs/1_72_0/libs/multiprecision/doc/html/boost_multiprecision/intro.html) are you using, GMP, or other? You might want to compare with a C code running ['mpz_fac_ui`](https://gmplib.org/manual/Number-Theoretic-Functions) of GMP compiled on your machine to get an idea of the best case benchmark. – Arc Mar 26 '22 at 22:12
  • @Arc added example code that shows the issue – Richard Mar 27 '22 at 06:24
  • First did you enable optimizations? Calculating the factorial involves multiplications and additions only, which are very fast. Converting to a decimal string OTOH requires a lot of divisions which are the slowest among the basic operations. Try printing in hexadecimal instead – phuclv Mar 27 '22 at 06:29
  • @phuclv Yes -O3. As hex it's 5.5 seconds instead of 7.6. Better but still very slow compared to java. – Richard Mar 27 '22 at 06:49
  • 2
    1) Try `boost::multiprecision::mpz_int` (including boost/multiprecision/gmp.hpp and linking with -lgmp) to see a very fast toString. 2) Go to https://github.com/boostorg/multiprecision/issues and ask for a faster conversion from cpp_int to string. 3) Using gmpxx.h, you can just use `mpz_class::factorial(num).get_str()` and get the full thing in less than 30ms. – Marc Glisse Mar 27 '22 at 08:43
  • @MarcGlisse mpz_int takes 0.024 seconds for the above output. Thats fast enoug. Thank you very much! – Richard Mar 27 '22 at 11:41

1 Answers1

1

I was going to recommend you post a self-answer. But then I had this crammed into a single comment:

feel free to self-answer @Richard, so the hint will help others. Also, a pet-peeve of mine: C++ doesn't need to look horrible: https://godbolt.org/z/53d338843. In fact I'd go further and make a lap function https://godbolt.org/z/6nf659Y91. Also nice: https://wandbox.org/permlink/3KCKBn1tOwwJCkIz

So I figured that I might as well post it here for added value of demonstration code.

Modernizing The Repro

We can express that program a lot cleaner:

#include <boost/multiprecision/gmp.hpp>
#include <chrono>
#include <iostream>

using namespace std::chrono_literals;
namespace bmp = boost::multiprecision;
auto now      = std::chrono::steady_clock::now;

auto factorial(uint32_t num) {
    bmp::mpz_int result{1};
    while (num)
        result *= num--;
    return result;
}

int main() {
    auto start = now();
    auto result = factorial(100'000);

    auto mid = now();
    std::cout << "calculation: " << (mid - start) / 1.s << "s" << std::endl;

    std::string s = result.str();

    std::cout << "toString: "    << (now() - mid) / 1.s << "s" << std::endl;
    std::cout << "length: "      << s.length()          << "\n";
}

Which may print something like

calculation: 2.17467s
toString: 0.0512504s
length: 456574

More Comparative Benchmarks

To see how much better mpz_int may perform, let's compare them:

Live On Wandbox

#include <boost/multiprecision/cpp_int.hpp>
#include <boost/multiprecision/gmp.hpp>
#include <chrono>
#include <iostream>

using namespace std::chrono_literals;
namespace bmp = boost::multiprecision;

template <typename T> T factorial(uint32_t num) {
    T result{1};
    while (num)
        result *= num--;
    return result;
}

#define TIMED(expr)                                                            \
    [&]() -> decltype(auto) {                                                  \
        using C = std::chrono::steady_clock;                                   \
        using namespace std::chrono_literals;                                  \
        struct X {                                                             \
            C::time_point s = C::now();                                        \
            ~X() {                                                             \
                std::cerr << std::fixed << (C::now() - s) / 1.s << "s\t"       \
                          << #expr << std::endl;                               \
            }                                                                  \
        } x;                                                                   \
        return (expr);                                                         \
    }()

template <typename T> void bench() {
    auto r = TIMED(factorial<T>(50'000));
    auto s = TIMED(r.str());
    std::cout << "length: " << s.length() << "\n";
}

int main() {
    TIMED(bench<bmp::mpz_int>());
    std::cout << "-----\n";
    TIMED(bench<bmp::cpp_int>());
}

Which may print something like

0.953427s       factorial<T>(100'000)
0.040691s       r.str()
length: 456574
0.994284s       bench<bmp::mpz_int>()
-----
1.410608s       factorial<T>(100'000)
8.014350s       r.str()
length: 456574
9.425064s       bench<bmp::cpp_int>()

As you can see GMP is orders of magnitude more optimized (~200x faster for the str() operation in my test run)

sehe
  • 374,641
  • 47
  • 450
  • 633