2

I am writing some code in c++ for a class assignment that requires work with multiprecision library such as boost. Basically, I need to build a hash table with some large integers and then lookup a certain value in that table.

When I use h, g, p that are commented out - the code runs fine and very quickly. Once I switch to those that are not commented out, it throws a memory exception at line: hash_str>::iterator got = mp.find(lkp); I am just starting out with c++ and pretty sure that something is way off, because this should run rather quickly, even with large numbers.

#include <boost/unordered_map.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/math/special_functions/pow.hpp>

using namespace std;
using namespace boost::multiprecision;

template <typename T>
struct hash_str
{
    size_t operator()( const T& t ) const
    {
        return std::hash<std::string>()
            ( t.str() );
    }
};

int main()
{
    boost::unordered_map<cpp_int, cpp_int, hash_str<cpp_int>> mp;
    //boost::unordered_map<hash_str<cpp_int>, cpp_int, hash_str<cpp_int>> mp;
    cpp_int k;
    cpp_int h( "3239475104050450443565264378728065788649097520952449527834792452971981976143292558073856937958553180532878928001494706097394108577585732452307673444020333" );
    cpp_int g( "11717829880366207009516117596335367088558084999998952205599979459063929499736583746670572176471460312928594829675428279466566527115212748467589894601965568" );
    //cpp_int g = 1010343267;
    //cpp_int h = 857348958;
    //cpp_int p = 1073676287;
    cpp_int p( "13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084171" );
    int b = pow( 2, 20 );
    cpp_int denom;
    cpp_int inv = powm( g, p - 2, p );

    //building a hash table of all values h/g^x1
    for ( cpp_int x = 1; x < b; ++x )

    {
        // go through all 2^20 values up to b, calculate the function h/g^x1, 
        // then hash it to put into table

        denom = powm( inv, x, p );
        k = ( h *denom ) % p;
        mp.insert( std::make_pair( k, x ) );


    }
    cpp_int lkp;
    for ( int v = 1; v < b; ++v )
    {
        //cpp_int gb = pow(g, b);
        lkp = powm( g, v*b, p );
        //looking for a match for g^b^x0 in map mp; when found we need to find x 
        //which is x1 and then calc 'x'
        boost::unordered::unordered_map<cpp_int, cpp_int, hash_str<cpp_int>>::iterator got = mp.find( lkp );
        // Check if iterator points to end of map or if we found our value
        if ( got != mp.end() )
        {
            std::cout << "Element Found - ";
            //std::cout << got->first << "::" << got->second << std::endl;
        }
        /*else
        {
        std::cout << "Element Not Found" << std::endl;
        }*/
    }
    return 0;

}

Just in case, here is the exception I get: Unhandled exception at 0x768F2F71 in MiM.exe: Microsoft C++ exception: boost::exception_detail::clone_impl > at memory location 0x0109EF5C.

Daniel S
  • 53
  • 1
  • 7
  • Well, these are pretty large numbers which propably exhaust the memory available to a 32-bit process. Try to build for x64 platform. – zett42 Feb 08 '18 at 10:12
  • Perhaps I should just hex and hash those numbers and then put them into the unordered_map. Can anyone recommend a fast hash for very very large integers? – Daniel S Feb 08 '18 at 12:17

1 Answers1

0

The hash function is pretty atrocious because it allocates a temporary string only to hash it. The string will have log(bits)/log(10) bytes of length.

The point of the hash is that it's a relatively fast way to compare numbers. With a hash that expensive, you're better of with a regular Tree container (std::map<> e.g.).

  • I haven't checked your formulas (especially around h/g^x1 because I'm not even sure that x represents x1). Outside of that issue,
  • I think there is a correctness issue with v * b overflowing the int capacity at least if you're on a 32-bit integer compiler.

I've cleaned up a little bit and it runs

#include <boost/math/special_functions/pow.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/unordered_map.hpp>
#include <chrono>

namespace bmp = boost::multiprecision;
using namespace std::chrono_literals;
using Clock = std::chrono::high_resolution_clock;

template <typename T> struct hash_str {
    size_t operator()(const T &t) const { return std::hash<std::string>()(t.str()); }
};

template <typename T> struct hash_bin {
    size_t operator()(const T &t) const {
        return boost::hash_range(t.backend().limbs(), t.backend().limbs()+t.backend().size());
    }
};
int main() {
    using bmp::cpp_int;
    boost::unordered_map<cpp_int, cpp_int, hash_bin<cpp_int> > mp;
#if 1
    cpp_int const h("32394751040504504435652643787280657886490975209524495278347924529719819761432925580738569379585531805328"
            "78928001494706097394108577585732452307673444020333");
    cpp_int const g("11717829880366207009516117596335367088558084999998952205599979459063929499736583746670572176471460312928"
            "594829675428279466566527115212748467589894601965568");
    cpp_int const p("13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690"
            "031858186486050853753882811946569946433649006084171");
#else
    cpp_int const g = 1010343267;
    cpp_int const h = 857348958;
    cpp_int const p = 1073676287;
#endif
    int constexpr b   = 1 << 20;
    cpp_int const inv = powm(g, p - 2, p);

    {
        auto s = Clock::now();

        // building a hash table of all values h/g^x1
        for (cpp_int x = 1; x < b; ++x) {
            // go through [1, b), calculate the function h/g^x1,
            // then hash it to put into table

            cpp_int denom = powm(inv, x, p);
            cpp_int k = (h * denom) % p;
            mp.emplace(std::move(k), x);
        }

        std::cout << "Built map in " << (Clock::now() - s)/1.0s << "s\n";
    }

    {
        auto s = Clock::now();

        for (cpp_int v = 1; v < b; ++v) {
            //std::cout << "v=" << v << " b=" << b << "\n";
            // cpp_int gb = pow(g, b);
            cpp_int const lkp = powm(g, v * b, p);

            // looking for a match for g^b^x0 in map mp; when found we need to find x
            // which is x1 and then calc 'x'
            auto got = mp.find(lkp);

            // Check if iterator points to end of map or if we found our value
            if (got != mp.end()) {
                std::cout << "Element Found - ";
                //std::cout << got->first << " :: " << got->second << "\n";
            }
        }
        std::cout << "Completed queries in " << (Clock::now() - s)/1.0s << "s\n";
    }
}

It runs in 1m4s for me.

Built map in 24.3809s
Element Found - Completed queries in 39.2463s
...

Using hash_str instead of hash_bin takes 1m13s:

Built map in 30.3923s
Element Found - Completed queries in 42.488s
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thank you, sehe, this is enormously helpful. Yes, x is x1, I just changed terminology and forgot to update comment. For some reason though, with your code I am experiencing the same thing, just taking forever to run. In fact, the uncleaned up version would at least to the second loop, this one just sits in the first one. I am wondering if there is some other issue going on with my compiler (I am using Visual Studio 2017 free version). – Daniel S Feb 08 '18 at 17:07
  • Reduced b to 2^16 it runs on [MSVC (Version 19.00.23506 for x64) online: Built map in 6.05741s Completed queries in 9.75423s](http://rextester.com/HOJS37569) (that's with the large factors). Did you enable "Release mode" (i.e. optimizations?) – sehe Feb 08 '18 at 17:10
  • Pardon my ignorance, but when I went to Configuration and set to Release, Visual Studio gave me 27 errors (cannot open source file etc.). – Daniel S Feb 08 '18 at 17:18
  • Oh, that's another question then. But there's your biggest hurdle right now. Search SO/MSDN for "Build Configurations", and "Property Sheets". That's not my area of expertise because I don't like using Visual Studio. – sehe Feb 08 '18 at 17:19
  • Thank you so much again! I was able to run it in Debug mode by turning on optimization. Of course, it was the Debug mode without optimization that was killing the memory. Really Really appreciated. – Daniel S Feb 08 '18 at 18:48
  • Could you tell me out of your tweaks, what were the most important? I would like to learn. – Daniel S Feb 08 '18 at 18:49
  • I think the correctness first (that overflow is real, and makes the program either abort because of negative exponent, or continue with [Undefined Behaviour](https://en.wikipedia.org/wiki/Undefined_behavior). – sehe Feb 08 '18 at 18:49
  • In terms of performance, I have shown you the differences. It's mild, hashing didn't dominate the runtime in _my_ tests. Still, doing hashing without allocation is obviously preferrable. Then, `emplace` with `std::move` has the potential to increase performance of building the hashtable. You could simply measure the performance difference it makes. – sehe Feb 08 '18 at 18:51
  • Re: "Thanks so much!" - Welcome to SO :) I hope you like it here. See [also](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) – sehe Feb 08 '18 at 18:52