C++, memory and arrays. Creating my own hashmap for exercise. Unexpected data left in memory?

Question

So i'm trying to create a pretty specific for my needs hashmap for a small project where i'm trying to learn c++. I have the following code:

template<class T>
class HashMap
{
public:
  HashMap();
  virtual ~HashMap();
  void add(T value);
  T get(T *value);
private:
  int hash(T *data);
  T _hashes[26]; //I want a fixed size here
};

template<class T>
HashMap<T>::HashMap()
{
  for(int i = 0; i < 26; i++)
    this->_hashes[i] = T();
}

template<class T>
HashMap<T>::~HashMap()
{
  //Don't really have anything to delete here?
}

template<class T>
int HashMap<T>::hash(T *dat)
{
  //Super simple, just to try things out
  return (long int) dat % 26;
}

template<class T>
T HashMap<T>::get(T *val)
{
  int idx = this->hash(val);
  cout << idx << endl;
  //Probably somewhere here i get my problem
  if(this->_hashes[idx])
    return this->_hashes[idx];
  return T();
}

template<class T>
void HashMap<T>::add(T val)
{
  //Should probably do some check if there's already an element here.
  this->_hashes[this->hash(&val)] = val;
}

The problem im having is that this compiles fine but when I do something like this in my main.cpp:

HashMap<char> a = HashMap<char>();
a.add('h');
a.add('c');
a.add('g');
char *b = new char {'c'};
cout << a.get(b) << endl;
delete b;

It usually returns the id, ie:

and an empty line which is just a empty char. (the output of the function is in the get() method), but sometimes it will show me something like this:

18
g

instead of 18 and an empty line. My question is why this happens and how i can prevent it? Does it have something to do with memory not being 'nulled' when it's deleted but just free for other programs to take and then i don't initialise it correctly? Also, if you have the time please point out any mistakes or not so good to do things in the code.

If it's of any interest im using GCC Debian 4.4.5-8 to compile and compile it with g++ -g file.cpp -o file

Thankful for any help!

It's probably preferable to use `std::array< T , 26> _hashes` instead of `T _hashes[26]` . Also in the destructor you should delete `_hashes` — shuttle87, Dec 25 '11 at 13:39
Why are you returning T() in get method. Sorry if I didn't understood something very evident here. — Shadow, Dec 25 '11 at 13:39
@shuttle87 I've never even heard of this std class, do you have a link to some documentation? Thanks for the tip! Found some now, reading into it! — lfxgroove, Dec 25 '11 at 13:43
@Shadow Just to return something that an uninitialised value, should perhaps just give a pointer to null or something. Haven't really thought of that part. — lfxgroove, Dec 25 '11 at 13:44
The array is in the header `` , it should be part of any setup with a recent standard library. — shuttle87, Dec 25 '11 at 13:45
What you are doing _not_ a hash table. It is something like... mm... set. Hash table assumes that you have pair of `key` and `value`. And you get `value` by `key`. Hash table looks like an array where index is not integer, but your own key type. — Lol4t0, Dec 25 '11 at 13:47
Ah, then i've kind of misunderstood the concept of a hashtable, thanks! — lfxgroove, Dec 25 '11 at 14:15

Mat · Answer 1 · 2011-12-25T13:57:05.790

3

The behavior you see is normal: if you get a value that you put in your hash, it will be displayed by your main. What is giving you surprising results is your hash function:

return (long int) dat % 26;

This hashes the dat pointer, not the T that dat points to. Try with:

return *dat % 26;

(Or just use a standard std::set.)

Another problem with your code:

T _hashes[26]; //I want a fixed size here   (a

and

this->_hashes = new T[26];                   (b

are incompatible. Either use the fixed array (a) and you don't need to allocate it (b), or use a plain pointer (T *_hashes) and do (b) - I'm surprised your compiler accepts what you have. If you use (a) you don't need anything in the destructor. If you use (b), you need to delete [] in your destructor.

Passing a T* in get but a T in set is a bit strange too.

edited Dec 25 '11 at 13:57

answered Dec 25 '11 at 13:38

Mat

202,337
40
393
406

The second bit i forgot to edit out, my bad. Fixed now, thanks! Just a follow up question, since it hashes the dat pointer i need the long int if i'm on a 64bit machine, otherwise a normal int would have sufficed? Also, how would i use a `std::set` in the hash function? – lfxgroove Dec 25 '11 at 13:54
2

You must **not** hash the pointer, you must hash the `T` (otherwise your structure will be completely useless). Since you only use a modulo, you're restricted to `T`s for which the modulo is defined - the (int) cast I put is probably unnecessary now that I think about it. You wouldn't use a `std::set` in your hash function, you would use that to replace your `HashTable` which isn't a hash table. – Mat Dec 25 '11 at 13:56
I understood that, but is the long int because i'm on a 64-bit machine? Oh, okay. I'll read up on it then, thanks! – lfxgroove Dec 25 '11 at 14:13
1

Whether a pointer has the same bit-length as an `int` or a `long int` on your platform is irrelevant here. If you need an integral type of the same width as a (data) pointer, use `uintprt_t` (but you shouldn't need that very often, if at all). – Mat Dec 25 '11 at 14:15
I know it's irrelevant but i would like to know if that's the reason i had to use long int to not loose precision when doing the cast? – lfxgroove Dec 25 '11 at 14:20
Find what the size of `*dat` was and find the size of `int` on your system. If the size of `*dat` was bigger than the size of `int` you have your answer... – shuttle87 Dec 25 '11 at 14:22
@Anton: No, you were doing a cast on the pointer itself (the address) when you should have been dereferencing that pointer. What your hash must hash is the value you're trying to store ('c' for instance), not the bit pattern of the address of where that char is stored in memory. – Mat Dec 25 '11 at 14:23
I think im to unclear about what im trying to ask now so i'll just try what shuttle87 said x) Thanks for all the help! – lfxgroove Dec 25 '11 at 14:25

shuttle87 · Accepted Answer · 2011-12-25T14:18:38.680

Here's a more idiomatic c++ implementation:

#include <array>
#include <iostream>
#define MAGIC_NUMBER 26 //rename this to something else preferably

template<class T>
class HashMap
{
    public:
        HashMap();
        virtual ~HashMap(){};
        void add(T value);
        T get(T *value);//potentially confusing that add and get take different types
    private:
        int hash(T *data);
        std::array<T, MAGIC_NUMBER>  _hashes; //I want a fixed size here
};

template<class T>
HashMap<T>::HashMap()
{
    std::fill(_hashes.begin(),_hashes.end(), T());
}

template<class T>
int HashMap<T>::hash(T *dat)
{
    //Super simple, just to try things out
    return (static_cast<int>(*dat)) % MAGIC_NUMBER;//prefer using c++ casts
}

template<class T>
T HashMap<T>::get(T *val) //this is strange, you pass in a pointer and get a non-pointer
{
  int idx = this->hash(val);
  std::cout << idx << std::endl;
  if(this->_hashes[idx])
    return this->_hashes[idx];
  return T();
}

template<class T>
void HashMap<T>::add(T val)
{
  //Should probably do some check if there's already an element here.
  this->_hashes[this->hash(&val)] = val;
}

int main(void){
    HashMap<char> a = HashMap<char>();
    a.add('h');
    a.add('c');
    a.add('g');
    char *b = new char {'c'};
    std::cout << a.get(b) << std::endl;
    delete b;
}

Note you will need to compile with c++0x or c++11 features to get the usage of the std::array class. One of the main benefits of the array class is that you get more safety with the memory allocation than just a plain c-style array.

Right now you might want to reconsider some elements of the design. In particular it is confusing that add and get, have different types. Also this isn't what people generally think about when they hear hashmap, this structure is more like a set.

Also as a coding standards note, if you are prefixing your member variables it's a bit tautological to also use this-> to access them.

I'll rename it to a set instead then! Should i return a pointer in the get method to make it more clear? Also, i shouldn't take a pointer in the add cause then the element can be deleted by someone else later? — lfxgroove, Dec 25 '11 at 14:24
With regards to the get method, just make sure you comment/document it well, because it's not intuitive that it works the way it does without looking at the implementation. If you are worried about memory management then you'd do well to hide the actual pointers in the implementation. Generally speaking if you are coding idiomatic c++ it's good to use RAII and other memory management as much as possible. If you are really worried about contents of the container being deleted you might want to look into shared_ptr. — shuttle87, Dec 25 '11 at 14:33

C++, memory and arrays. Creating my own hashmap for exercise. Unexpected data left in memory?

2 Answers2