9

I am implementing tbb's concurrent hash map to compare the performance of it against a suite of other concurrent hash tables.

However the performance I get out of it is horrendous, I just can't believe it is that slow compared to other concurrent hash tables

Here is my implementation of it:

class TBB: public TestDs{
    typedef tbb::concurrent_hash_map<int,int, HashCompare<int> > hash_t;
private:
        hash_t _ds;
public:
        TBB(const Configuration& config) : _ds(config.initial_count) {
        }

    bool containsKey(int key) {
        hash_t::accessor a;

        if(_ds.find(a,key)){
            return true;
        }
        else 
            return false;
    }

    int get(int key) {
        hash_t::accessor a;

        if(_ds.find(a,key)){
             return (int)(a->second);
        }
        else 
            return 0;
    }

    int put(int key, int value) {
        return _ds.insert( std::make_pair(key, value) );
    }

    int remove(int key) {
        return _ds.erase(key);
    }

    int size() {
        return _ds.size();
    }
    const char* name() {
        return "TBB";
    }
    void print() {}
    void shutdown() {}

};

Does anyone see any issue with my implementation or know of any reason why it may perform slow? It takes over 30 mins for it to insert 200,000 elements in a single thread environment. To put that in perspective, nearly all the other tables perform this test in less than 5 mins.

Here is my build code:

-w  -DNDEBUG -g -msse2 -m32  -DINTEL -D_REENTRANT -lrt -pthread -fno-strict-aliasing -l cds -l tbb -lllalloc 

UPDATE: I have adjusted my testing code to prefill the hash tables to 1000, instead of 100,000. When running it again, tbb performs 92 op/sec while, another implementation performs 89431 op/sec. (64 thread environment)...Just saying something doesn't seem right....

Additional Info: Computer is a HP Z600 Workstation with 6gb of ram and 6 cores.

Notice Cross positing at : http://software.intel.com/en-us/forums/showthread.php?t=86119

Steven Feldman
  • 833
  • 1
  • 15
  • 28
  • 3
    You're compiling without optimizations enabled? – jalf Sep 16 '11 at 20:39
  • 2
    And what is the "other implementation"? – jalf Sep 16 '11 at 20:41
  • Yes with out optimizations, but that is for a different reason. The other hashtables are libcds's split list and michael hash map, as well as my own hashtable – Steven Feldman Sep 16 '11 at 20:59
  • 5
    It doesn't matter what the reason is, comparing performance with optimizations disabled makes no sense. You might as well compare the speed of two cars when you've taken the wheels off them. – jalf Sep 16 '11 at 21:04
  • @gman, https://github.com/MrStevenFeldman/Concurrent-Non-Blocking-HashTable/tree/Testing/HashTableTester/test – Steven Feldman Sep 16 '11 at 21:05
  • Yes but when optimizing concurrent data structures The reording of operations can be very dangerous to your correctness – Steven Feldman Sep 16 '11 at 21:07
  • 3
    @Steven: That has nothing to do with debugging. If turning optimizations on breaks your code, it's your code that was broken in the first place. – GManNickG Sep 16 '11 at 21:11
  • I added the -O3 Flag, it had very little improvement in the TBB test run. Is there any other recommended changes to be made? or configuration of tbb itself? – Steven Feldman Sep 16 '11 at 21:22
  • 1
    @Steven: I don't have time to look too deep at it, hopefully someone else can, but look at it like this: the TBB data structures are not just thrown-together hobbyist code, they were programmed by some of the best, and tested thereon out. If you're having performance problems, either the structures you're comparing it to are broken (for example, don't properly handle thread-safety), or you're test code doesn't reflect how the data structure is suppose to be used in practice. I really highly suspect it's not TBB's fault. Try looking to see if you can simplify the tests and identify problems. – GManNickG Sep 16 '11 at 21:57
  • I never thought it was tbb's fault, its that I had spent a week trying to isolate why it was so slow, and I overlooked the hash function hashing to a constant – Steven Feldman Sep 16 '11 at 22:28

1 Answers1

12

You HashCompare::hash() returns sizeof(int), which, I guess, means every entry maps into the same bucket. It seems like you are not using it as a hash table, more of a linked list.

You could try using Boost's hash:

#include <boost/functional/hash.hpp>

template<typename K> 
struct HashCompare { 
    static size_t hash( const K& key )                  { return boost::hash_value(key); } 
    static bool   equal( const K& key1, const K& key2 ) { return ( key1 == key2 ); } 
}; 
foxcub
  • 2,517
  • 2
  • 27
  • 27
  • 2
    Where do you see the implementation of `HashCompare`? – jalf Sep 16 '11 at 21:58
  • In https://github.com/MrStevenFeldman/Concurrent-Non-Blocking-HashTable/blob/Testing/HashTableTester/test/main.cpp – foxcub Sep 16 '11 at 21:59
  • OMG THANKS! THat makes so much sense! – Steven Feldman Sep 16 '11 at 22:07
  • You are welcome. I hope it works. I'd be curious to know how it actually compares to other implementations. – foxcub Sep 16 '11 at 22:08
  • The hash compare function was a copy and paste from a testing program I had found else where, It never occurred to me check the hashing! – Steven Feldman Sep 16 '11 at 22:14
  • If you want to try other hash functions, Google recently released CityHash ( http://code.google.com/p/cityhash/ ). – Max Lybbert Sep 16 '11 at 22:18
  • @Steven: I'm sure, we all will appreciate updated performance results. At least if now TBB shows better results than other implementation – Andriy Tylychko Sep 18 '11 at 19:18
  • 1
    Yes, TBB now performs better in cases where table modifications (insert/update/remove) are less than 70% of total operations. However anything higher, my table performs much better. (At 90%, we perform twice the ops per second), Currently I am rewriting the the algorithm to remove a few state variables, and other improvements. – Steven Feldman Sep 24 '11 at 14:58
  • As a side note for those who are looking over the code, the probing version of the algorithm will no longer be further developed, because of a significant lose in performance due to extra overhead, and no significant memory was saved by probing. (Though it was a much more interesting and challenging wait-free algorithm) – Steven Feldman Sep 24 '11 at 15:00