4

I understand that code will be slower, but why so much? How do I code to avoid this slowdown?

std::unordered_map uses other containers internally and those containers use iterators. When built debug, _ITERATOR_DEBUG_LEVEL=2 by default. This turns on iterator debugging. Sometimes my code is not affected much, and sometimes it runs extremely slowly.

I can speed my example up by setting _ITERATOR_DEBUG_LEVEL=0 in my project properties >> C++ >> Preprocessor >> Preprocessor definitions. But as this link suggests, I cannot do so in my real project. In my case, I get conflicts with MSVCMRTD.lib, which contains std::basic_string built with _ITERATOR_DEBUG_LEVEL=2. I understand I can work around the problem by statically linking to the CRT. But I would prefer not to if I can fix the code so the problem does not arise.

I can make changes that improve the situation. But I am just trying things out without understanding why they work. For example, as is, the first 1000 inserts work at full speed. But if I change O_BYTE_SIZE to 1, the first inserts are as slow as everything else. This looks like a small change (not necessarily a good change.)

This, this, and this also shed some light, but don't answer my question.

I am using Visual Studio 2010 (This is legacy code.) I created a Win32 console app and added this code.

Main.cpp

#include "stdafx.h"


#include "OString.h"
#include "OTHashMap.h"

#include <cstdio>
#include <ctime>
#include <iostream>

// Hash and equal operators for map
class CRhashKey {
public:
   inline unsigned long operator() (const OString* a) const { return a->hash(); }
};

class CReqKey {
public:
    inline bool operator() (const OString& x, const OString& y) const { return strcmp(x.data(),y.data()) != 0; }
    inline bool operator() (const OString* x, const OString& y) const { return operator()(*x,y); }
    inline bool operator() (const OString& x, const OString* y) const { return operator()(x,*y); }
    inline bool operator() (const OString* x, const OString* y) const { return operator()(*x,*y); }
};


int _tmain(int argc, _TCHAR* argv[])
{
    const int CR_SIZE = 1020007;

    CRhashKey h;
    OTPtrHashMap2<OString, int, CRhashKey, CReqKey> *code_map = 
        new OTPtrHashMap2 <OString, int, CRhashKey, CReqKey>(h, CR_SIZE);

    const clock_t begin_time = clock();

    for (int i=1; i<=1000000; ++i)
    {
        char key[10];
        sprintf(key, "%d", i);

        code_map->insert(new OString(key), new int(i));

        //// Check hash values
        //OString key2(key);
        //std::cout << i << "\t" << key2.hash() << std::endl;

        // Check timing
        if ((i % 100) == 0)
        {
            std::cout << i << "\t" << float(clock() - begin_time) / CLOCKS_PER_SEC << std::endl;
        }
    }

    std::cout << "Press enter to exit" << std::endl;
    char buf[256];
    std::cin.getline(buf, 256);

    return 0;
}

OTHashMap.h

#pragma once

#include <fstream>
#include <unordered_map>    

template <class K, class T, class H, class EQ>
class OTPtrHashMap2
{
    typedef typename std::unordered_map<K*,T*,H,EQ>                     OTPTRHASHMAP_INTERNAL_CONTAINER;
    typedef typename OTPTRHASHMAP_INTERNAL_CONTAINER::iterator          OTPTRHASHMAP_INTERNAL_ITERATOR;

public:
    OTPtrHashMap2(const H& h, size_t defaultCapacity) : _hashMap(defaultCapacity, h) {}

    bool insert(K* key, T* val)
    {
        std::pair<OTPTRHASHMAP_INTERNAL_ITERATOR,T> retVal = _hashMap.insert(std::make_pair<K*,T*>(key, val));
        return retVal.second != NULL;
    }

    OTPTRHASHMAP_INTERNAL_CONTAINER _hashMap;

private:
};

OString.h

#pragma once

#include <string>

class OString
{
public:
    OString(const std::string& s) : _string (s) { } 
    ~OString(void) {}

    static unsigned hash(const OString& s) { return unsigned (s.hash()); }
    unsigned long hash() const
    {
        unsigned hv = static_cast<unsigned>(length());
        size_t i = length() * sizeof(char) / sizeof(unsigned);
        const char * p = data();
        while (i--) {
            unsigned tmp;
            memcpy(&tmp, p, sizeof(unsigned));
            hashmash(hv, tmp);
            p = p + sizeof(unsigned);
        } 
        if ((i = length() * sizeof(char) % sizeof(unsigned)) != 0)  {
            unsigned h = 0;
            const char* c = reinterpret_cast<const char*>(p);
            while (i--)
            {
                h = ((h << O_BYTE_SIZE*sizeof(char)) | *c++);
            }
            hashmash(hv, h);
        }
        return hv; 
    }

    const char* data() const { return _string.c_str(); }
    size_t length() const    { return _string.length(); }


private:
    std::string _string;

    //static const unsigned O_BYTE_SIZE = 1;
    static const unsigned O_BYTE_SIZE = 8;
    static const unsigned O_CHASH_SHIFT = 5;

    inline void hashmash(unsigned& hash, unsigned chars) const
    {
        hash = (chars ^
                ((hash << O_CHASH_SHIFT) |
                 (hash >> (O_BYTE_SIZE*sizeof(unsigned) - O_CHASH_SHIFT))));
    }
};
rici
  • 234,347
  • 28
  • 237
  • 341
mmesser314
  • 371
  • 3
  • 9
  • I have seen 100X slow downs myself in debug builds versus release. Not necessarily the specific case you point out however. – drescherjm Jul 10 '19 at 00:07
  • 1
    "_How do I code to avoid this slowdown?_" Why do you need to? The code, in production, will always be compiled in Release mode, hence, the user will never see such slowdown. – Algirdas Preidžius Jul 10 '19 at 00:26
  • 1
    Release build, debug symbols, optimization off either globally or selectively with pragmas. – Retired Ninja Jul 10 '19 at 00:32
  • @AlgirdasPreidžius - My real project loads a large map. It takes ~ 1 minute in release, and more than an hour in debug. Developers just can't work with that. – mmesser314 Jul 10 '19 at 01:06
  • 1
    Any way you can bundle that bit of nastiness up into one module that is compiled with optimizations on and a "Here Be Dragons!" warning for those who need to delve into its inner workings? – user4581301 Jul 10 '19 at 02:08
  • @RetiredNinja - Thanks. Helpful. – mmesser314 Jul 10 '19 at 17:25
  • @user4581301 - Sorry, no. Our company security blocks access to dropbox and yousendit. – mmesser314 Jul 10 '19 at 21:29
  • I think you misunderstand my comment. I was suggesting you turn the code in question into an optimized library that is used by everybody else's unoptimized code. You wall off the slow stuff and pray the other developers never have to debug it. If they do, then they suffer the slow. Otherwise zoom, relatively speaking. – user4581301 Jul 10 '19 at 21:36
  • @user4581301 - Ah, much more sensible. Good point. Thanks. – mmesser314 Jul 11 '19 at 00:20

1 Answers1

3

I found enough of an answer. Collisions are the source of slowing.

Edit 2: -- Another fix is to add this around the #include in main.cpp --

// Iterator debug checking makes the Microsoft implementation of std containers 
// *very* slow in debug builds for large containers. It must only be undefed around 
// STL includes. Otherwise we get linker errors from the debug C runtime library, 
// which was built with _ITERATOR_DEBUG_LEVEL set to 2. 
#ifdef _DEBUG
#undef _ITERATOR_DEBUG_LEVEL
#endif

#include <unordered_map>

#ifdef _DEBUG
#define _ITERATOR_DEBUG_LEVEL 2
#endif

Edit: -- The fix is switch to boost::unordered_map. --

std::unordered_map is defined in < unordered_map >. It inherits from _Hash, defined in < xhash >.

_Hash contains this (highly abbreviated)

template<...> 
class _Hash
{
    typedef list<typename _Traits::value_type, ...> _Mylist;
    typedef vector<iterator, ... > _Myvec;

    _Mylist _List;  // list of elements, must initialize before _Vec
    _Myvec _Vec;    // vector of list iterators, begin() then end()-1
};

All values are stored in _List.

_Vec is a vector of iterators into _List. It divides _List into buckets. _Vec has an iterator to the beginning and end of each bucket. Thus, if the map has 1M buckets (distinct key hashes), _Vec has 2M iterators.

When a key/value pair is inserted into the map, usually a new bucket is created. The value is pushed onto the beginning of the list. The hash of the key is the location in _Vec where two new iterators are put. This is quick because they point to the beginning of the list.

If a bucket already exists, the new value must be inserted next to the existing value in _List. This requires inserting an item in the middle of the list. Existing iterators must be updated. Apparently this requires a lot of work when iterator debugging is enabled. The code is in < list >, but I did not step through it.


To get an idea of how much work, I used some nonsense hash functions that would be terrible to use, but give lots of collisions or few collisions when inserting.

Added to OString.h

static unsigned hv2;

// Never collides. Always uses the next int as the hash
unsigned long hash2() const
{
    return ++hv2;
}

// Almost never collides. Almost always gets the next int. 
// Gets the same int 1 in 200 times. 
unsigned long hash3() const
{
    ++hv2;
    unsigned long lv = (hv2*200UL)/201UL;
    return (unsigned)lv;
}

// A best practice hash
unsigned long hash4() const
{
    std::hash<std::string> hasher;
    return hasher(_string);
}

// Always collides. Everything into bucket 0. 
unsigned long hash5() const
{
    return 0;
}

Added to main.cpp

// Hash and equal operators for map
class CRhashKey {
public:
   //inline unsigned long operator() (const OString* a) const { return a->hash(); }
   //inline unsigned long operator() (const OString* a) const { return a->hash2(); }
   //inline unsigned long operator() (const OString* a) const { return a->hash3(); }
   //inline unsigned long operator() (const OString* a) const { return a->hash4(); }
   inline unsigned long operator() (const OString* a) const { return a->hash5(); }
};

unsigned OString::hv2 = 0;

The results were dramatic. No realistic hash is going to work.

  • hash2 - Never collide - 1M inserts in 15.3 sec
  • hash3 - Almost never - 1M inserts in 206 sec
  • hash4 - Best practice - 100k inserts in 132 sec, and getting slower as collisions became more frequent. 1M inserts would take > 1 hour
  • hash5 - Always collide - 1k inserts in 48 sec, or 1M inserts in ~13 hours

My choices are

  • Release build, debug symbols, optimization off as Retired Ninja suggests
  • Statically link to MSVCMRTD so I can turn off _ITERATOR_DEBUG_LEVEL. Also solve some other similar issues.
  • Change from unordered_map to a sorted vector.
  • Something else. Suggestions welcome.
mmesser314
  • 371
  • 3
  • 9
  • 1
    How about changing from `std::unordered_map` to `std::map`? The latter doesn't use hashing and therefore can't be affected by hash collisions, but in other ways behaves not too differently from a `std::unordered_map` (the main difference being that it keeps its entries in a sorted tree using the `<` operator of the key-type) – Jeremy Friesner Jul 10 '19 at 04:21
  • @JeremyFriesner - Thanks. Good thought. – mmesser314 Jul 10 '19 at 17:24