Questions about implementing my own HashMap in Java

Question

I am working on an assignment where I have to implement my own HashMap. In the assignment text it is being described as an Array of Lists, and whenever you want to add an element the place it ends up in the Array is determined by its hashCode. In my case it is positions from a spreadsheet, so I have just taken columnNumber + rowNumber and then converted that to a String and then to an int, as the hashCode, and then I insert it that place in the Array. It is of course inserted in the form of a Node(key, value), where the key is the position of the cell and the value is the value of the cell.

But I must say I do not understand why we need an Array of Lists, because if we then end up with a list with more than one element, will it not increase the look up time quite considerably? So should it not rather be an Array of Nodes?

Also I have found this implementation of a HashMap in Java:

public class HashEntry {
      private int key;
      private int value;

      HashEntry(int key, int value) {
            this.key = key;
            this.value = value;
      }     

      public int getKey() {
            return key;
      }

      public int getValue() {
            return value;
      }
}

public class HashMap {
  private final static int TABLE_SIZE = 128;

  HashEntry[] table;

  HashMap() {
        table = new HashEntry[TABLE_SIZE];
        for (int i = 0; i < TABLE_SIZE; i++)
              table[i] = null;
  }

  public int get(int key) {
        int hash = (key % TABLE_SIZE);
        while (table[hash] != null && table[hash].getKey() != key)
              hash = (hash + 1) % TABLE_SIZE;
        if (table[hash] == null)
              return -1;
        else
              return table[hash].getValue();
  }

  public void put(int key, int value) {
        int hash = (key % TABLE_SIZE);
        while (table[hash] != null && table[hash].getKey() != key)
              hash = (hash + 1) % TABLE_SIZE;
        table[hash] = new HashEntry(key, value);
  }
}

So is it correct that the put method, looks first at the table[hash], and if that is not empty and if what is in there has not got the key, being inputted in the method put, then it moves on to table[(hash + 1) % TABLE_SIZE]. But if it is the same key it simply overwrites the value. So is that correctly understood? And is it because the get and put method use the same method of looking up the place in the Array, that given the same key they would end up at the same place in the Array?

I know these questions might be a bit basic, but I have spend quite some time trying to get this sorted out, why any help would be much appreciated!

Edit

So now I have tried implementing the HashMap myself via a Node class, which just constructs a node with a key and a corresponding value, it has also got a getHashCode method, where I just concatenate the two values on each other.

I have also constructed a SinglyLinkedList (part of a previous assignment), which I use as the bucket.

And my Hash function is simply hashCode % hashMap.length.

Here is my own implementation, so what do you think of it?

package spreadsheet; 

public class HashTableMap {

  private SinglyLinkedListMap[] hashArray;
  private int size;


  public HashTableMap() {
    hashArray = new SinglyLinkedListMap[64];
    size = 0;  
  }


  public void insert(final Position key, final Expression value) {

      Node node = new Node(key, value); 
      int hashNumber = node.getHashCode() % hashArray.length;       
      SinglyLinkedListMap bucket = new SinglyLinkedListMap();
      bucket.insert(key, value);
      if(hashArray[hashNumber] == null) {
        hashArray[hashNumber] = bucket;
        size++; 
      }
      if(hashArray[hashNumber] != null) {
        SinglyLinkedListMap bucket2 = hashArray[hashNumber];
        bucket2.insert(key, value);
        hashArray[hashNumber] = bucket2;
        size++; 
      }
      if (hashArray.length == size) {
          SinglyLinkedListMap[] newhashArray = new SinglyLinkedListMap[size * 2];
      for (int i = 0; i < size; i++) {
          newhashArray[i] = hashArray[i];
      }
      hashArray = newhashArray;
    }
  } 

  public Expression lookUp(final Position key) {
      Node node = new Node(key, null); 
      int hashNumber = node.getHashCode() % hashArray.length;
      SinglyLinkedListMap foundBucket = hashArray[hashNumber];
      return foundBucket.lookUp(key); 
  }
 }

The look up time should be around O(1), so I would like to know if that is the case? And if not how can I improve it, in that regard?

The collision/performance questions are addressed in the wikipedia article. — keyser, Jan 28 '13 at 18:31
They talk about hash collisions, but he is also having key collisions, which will not work without changing key encoding. — Mel Nicholson, Jan 28 '13 at 18:38

score 9 · Answer 1 · answered Jan 28 '13 at 18:30

You have to have some plan to deal with hash collisions, in which two distinct keys fall in the same bucket, the same element of your array.

One of the simplest solutions is to keep a list of entries for each bucket.

If you have a good hashing algorithm, and make sure the number of buckets is bigger than the number of elements, you should end up with most buckets having zero or one items, so the list search should not take long. If the lists are getting too long it is time to rehash with more buckets to spread the data out.

score 1 · Answer 2 · answered Jan 28 '13 at 18:30

It really depends on how good your hashcode method is. Lets say you tried to make it as bad as possible: You made hashcode return 1 every time. If that were the case, you'd have an array of lists, but only 1 element of the array would have any data in it. That element would just grow to have a huge list in it.

If you did that, you'd have a really inefficient hashmap. But, if your hashcode were a little better, it'd distribute the objects into many different array elements and as a result it'd be much more efficient.

The most ideal case (which often isn't achievable) is to have a hashcode method that returns a unique number no matter what object you put into it. If you could do that, you wouldn't ever need an array of lists. You could just use an array. But since your hashcode isn't "perfect" it's possible for two different objects to have the same hashcode. You need to be able to handle that scenario by putting them in a list at the same array element.

But, if your hashcode method was "pretty good" and rarely had collisions, you rarely would have more than 1 element in the list.

Mel Nicholson · Answer 3 · 2013-01-28T18:35:46.223

The Lists are often referred to as buckets and are a way of dealing with collisions. When two data elements have the same hash code mod TABLE SIZE they collide, but both must be stored.

A worse kind of collision is two different data point having the same key -- this is disallowed in hash tables and one will overwrite the others. If you just add row to column, then (2,1) and (1,2) will both have a key of 3, which means they cannot be stored in the same hash table. If you concatenated the strings together without a separator then the problem is with (12,1) versus (1, 21) --- both have key "121" With a separator (such as a comma) all the keys will be distinct.

Distinct keys can land in the same buck if there hashcodes are the same mod TABLE_SIZE. Those lists are one way to store the two values in the same bucket.

Hi I have included my implementation of the HashMap, in my original post, so if someone would take a look at that I would be more than happy :) The look up time should be around O(1), so I would like to know if that is the case? And if not how can I improve it, in that regard? — Per John, Jan 31 '13 at 21:15

score 0 · Answer 4 · answered Jan 28 '13 at 18:42

class SpreadSheetPosition {
    int column;
    int row;

    @Override
    public int hashCode() {
        return column + row;
    }
}

class HashMap {
    private Liat[] buckets = new List[N];

    public void put(Object key, Object value) {
        int keyHashCode = key.hashCode();
        int bucketIndex = keyHashCode % N;
        ...
    }
}

Compare having N lists, with having just one list/array. For searching in a list one has to traverse possibly the entire list. By using an array of lists, one at least reduces the single lists. Possibly even getting a list of one or zero elements (null).

If the hashCode() is as unique as possible the chance for an immediate found is high.

Questions about implementing my own HashMap in Java

4 Answers4