5

I've been using STL's unordered_map recently and while it seems to work nicely I don't quite understand how the hashing function works given that the data type is given as a template parameter. In an effort to understand this data structure more thoroughly, I implemented my own little Hashmap class in C++ :

Hashmap interface:

#ifndef _HASHMAP_H_
#define _HASHMAP_H_

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iostream>
#include <vector.h>


//Beginning of Hashmap class definition

template <class Key, class Value>
class Hashmap{
private:

int mappedElementCount;



public:
explicit Hashmap();
virtual ~Hashmap();


virtual void test();

virtual int hash(Key*);

int* getSize();

void putKVPair(Key*,Value*);

void clearMap();


//When we use these methods, we'll want a linear vector of keys and values to 
    //iterate over, so vector is good here
std::vector<Key>* getKeys();
std::vector<Value>* getValues();

}; //end hashmap class definition
#endif /*_HASHMAP_H_*/

Hashmap implementation:

#include "Hashmap.h"

template<class Key,class Value> Hashmap<Key,Value>::Hashmap(){
mappedElementCount = 0;
}
template<class Key,class Value> Hashmap<Key,Value>::~Hashmap(){
printf("\nDestroying the base Hashmap object!\n");
}

template<class Key,class Value> void Hashmap<Key,Value>::test(){
printf("The size of our Key is %i and the size of our Value is
    %i\n",sizeof(Key),sizeof(Value));
}


template<class Key,class Value> int Hashmap<Key,Value>::hash(Key* k_ptr){

    unsigned int hashval;

    /* we start our hash out at 0 */
    hashval = 0;

        //TODO: How do we generate a hash signature when we don't know what data type 
        //we're going to be working with?

    return hashval % mappedElementCount;

}

template<class Key,class Value> std::vector<Key>* Hashmap<Key,Value>::getKeys(){
//TODO: prepare a vector initialized with all Key objects and return it here
return keys;    
}

template<class Key,class Value> std::vector<Value>* Hashmap<Key,Value>::getValues(){
//TODO: prepare a vector initialized with all Value objects and return it here
return values;  
}

template<class Key,class Value> int* Hashmap<Key,Value>::getSize(){
return &mappedElementCount;
}

template<class Key,class Value> void Hashmap<Key,Value>::putKVPair(Key* k, Value* v){
    //TODO: implement hashing of the key object k to determine
    //the address of the value object v

    //first step, generate a hash from our key
    int tempHash = hash(k);

       //TODO: store the Value at an address given by or influenced by tempHash

    //If all was successfully completed, increment the mapped records counter
    mappedElementCount++;
}



template<class Key,class Value> void Hashmap<Key,Value>::clearMap(){
//TODO: implement a cascading chain of deallocation of stored objects within the 
    //hashmap
//MAYBE-- only if we create new objects rather than just mapping reference 
    //associations,
//which is really the goal here...  In the latter case, just empty the Hashmap 
    //itself
}

One possible OOP method of addressing this problem is to use Hashmap as a base class and provide derived classes that have known Key data types, such as the following Stringmap:

Stringmap interface:

#ifndef _STRINGMAP_H_
#define _STRINGMAP_H_

#include "Hashmap.h"

template <class Value>
class Stringmap:public Hashmap<std::string,Value>{
private:

public:
//Con/de 'structors
explicit Stringmap();
~Stringmap();

//Here we know our Key will be of type std::string
//so we can generate our hash sig by char values
    //Override hash from the base class
int hash(std::string*);

//override test from base class
void test();


};
#endif /*_STRINGMAP_H_ def*/

Stringmap implementation:

#include "Stringmap.h"

template<class Value> Stringmap<Value>::Stringmap():Hashmap<std::string,Value>(){

}
template<class Value> Stringmap<Value>::~Stringmap(){
printf("\nDestroying the derived stringmap object!\n");
}

template<class Value> void Stringmap<Value>::test(){
printf("The size of our Value is %i\n",sizeof values[0]);
}

template<class Value> int Stringmap<Value>::hash(std::string* str_ptr){

    unsigned int hashval;

    /* we start our hash out at 0 */
    hashval = 0;


    /* for each character, we multiply the old hash by 31 and add the current
     * character.  Remember that shifting a number left is equivalent to
     * multiplying it by 2 raised to the number of places shifted.  So we
     * are in effect multiplying hashval by 32 and then subtracting hashval.
     * Why do we do this?  Because shifting and subtraction are much more
     * efficient operations than multiplication.
     */
    for(int i=0;i<str_ptr->length();i++) {
        hashval = (*(str_ptr))[i] + ((hashval << 5) - hashval);
    }

    /* we then return the hash value mod the hashmap size so that it will
     * fit into the necessary range
     */
    return hashval % (*(Hashmap<std::string,Value>::getSize()));

}

So the question is as follows: is it possible to create a hash signature when the data type to be hashed is currently unknown? If so, how? Looking at the std::hash docs, it appears that the C++ standard simply defines a hash function for each primitive data type and also for T* (for any type T)... What's missing is how this hashing is implemented for the given primitive data types and, more to the point, how it is implemented for the generic T*. I suppose I could just call hash(Key) and hope for the best, but it would be nice to understand what's going on behind the scenes.

thanks, CCJ

CCJ
  • 1,619
  • 26
  • 41
  • Take a look on http://stackoverflow.com/questions/1138742/looking-for-a-good-hash-table-implementation-in-c – Mihai8 Jan 21 '13 at 19:57
  • Why not just look at the implementation of `std::unordered_map`? – Seth Carnegie Jan 21 '13 at 20:00
  • 1
    The T* implementation starts by converting the pointer to a void* and then to an integer, effectively discarding the type. – rici Jan 21 '13 at 20:00
  • @rici aha! So they cheat XD I suspected it might be something like that – CCJ Jan 21 '13 at 20:26
  • @ccj: it's not cheating. pointer equality is well-defined between objects of the same type, and containers of pointers are containers of pointers, not containers of pointed-to's. There's no need to look at the target of the pointer. If you want a associative container (ordered or unordered) to order/hash by the target of a pointer, you have to provide an explicit comparison/hash template parameter. (Technically, ordered associative containers of pointers can only hold keys which point into the same aggregate. fwiw.) – rici Jan 21 '13 at 20:32
  • @rici what do you mean here by pointer equality? What does your last sentence 'ordered associative containers of pointers can only hold keys which point into the same aggregate' mean (perhaps related to the need for a common comparator in an ordered-associative container)? – CCJ Jan 21 '13 at 23:33
  • @CCJ: two pointers are equal if the `==` operator returns `true`. (§ 5.10(1) "Pointers of the same type (after pointer conversions) can be compared for equality. Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address." ... – rici Jan 22 '13 at 00:28
  • @CCJ, and comparision of pointers with `<` is specified by 5.9(2): "If two pointers p and q of the same type point to different objects that are not members of the same object or elements of the same array or to different functions, or if only one of them is null, the results of pq, p<=q, and p>=q are unspecified." So if you do "Foo* a = new Foo(); Foo* b = new Foo();" then `a < b` and `b < a` are unspecified; I know of no wording which guarantees them to be distinct. The default comparison operator for ordered containers is `std::less` which just uses `operator<` – rici Jan 22 '13 at 00:36
  • @ccj, ok I take the comment about ordered associative containers back. I just found that `std::less` is valid (20.8.5: "For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not." – rici Jan 22 '13 at 00:50

2 Answers2

5

std::unorderd_map takes 2 explicit template parameters (Key, and Value) and also has a bunch of hidden template parameters, of which the Hash function is defaulted to std::hash<Key>.

This STL hash function std::hash<Key> takes a Key and returns a std::size_t. It is already specialized for all the integral types and std::string. From this reference site

The hash template defines a function object that implements a hash function. Instances of this function object define an operator() that:

  1. Accepts a single parameter of type Key.
  2. Returns a value of type size_t that represents the hash value of the parameter.
  3. Does not throw exceptions when called.
  4. For two parameters k1 and k2 that are equal, std::hash()(k1) == std::hash()(k2).
  5. For two different parameters k1 and k2 that are not equal, the probability that std::hash()(k1) == std::hash()(k2) should be very small, approaching 1.0/std::numeric_limits::max().

The hash template is both CopyConstructible and Destructible. The unordered associative containers std::unordered_set, std::unordered_multiset, std::unordered_map, std::unordered_multimap use specializations of the template std::hash as the default hash function.

The reference ends with this quote:

** The actual hash functions are implementation-dependent and are not required to fulfill any other quality criteria except those specified above. **

So you can look at the implementation of your system, but that won't guarantee anything for other systems' implementation.

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
3

There's a std::hash<T> template which is specialized for a variety of types, and which you can specialize for your own types.

By default, std::unordered_map<T> simply delegates hashing to std::hash<T> (or you can specify a different hash function as a template argument).

Thus std::unordered_map does not need to know anything at all about the mechanics of hashing.

As to how std::hash is implemented, that's not specified. However, I think it's reasonable to assume that any decent compiler would provide a good-quality implementation. One gotcha to bear in mind is that std::hash<char*> doesn't hash the C string, it only hashes the value of the pointer (been there :))

NPE
  • 486,780
  • 108
  • 951
  • 1,012