1

I am trying to rehash() my HashTable every time I get a collision but I keep getting a Java heap space error.

Basically, I have a String[] table whose length I want to multiply by 2 every time I have a collision in my hash.

Edit : I am using insert() in a while loop which loads around 300.000 words into the hash table.

 public void rehash() {
        String[] backup = table;
        size = size * 2;
        // i get the error on the line below
        table = new String[size];
        System.out.println("size" + size);
        for (int i = 0; i < backup.length; i++) {
            if (backup[i] != null) {
                insert(backup[i]);
            }

        }

   public void insert(String str) {

        int index = hashFunction(str);

        if (index > size || table[index] != null) {
            rehash();
        }

        table[index] = str;
    }

My hash function :

int val= 0;
        val= s.hashCode();
        if (val< 0) {
            val*= -1;
        }

        while (val> this.size) {
            val%= this.size;
        }

        return val;


 public void load() {
        String str = null;
        try {
            BufferedReader in = new BufferedReader(new FileReader(location));
            while ((str = in.readLine()) != null) {
                insert(str);
            }
            in.close();
        } catch (Exception e) {
            System.out.println("exception");
        }
    }
Boann
  • 48,794
  • 16
  • 117
  • 146
Kaan
  • 119
  • 1
  • 8
  • did u checked this http://stackoverflow.com/questions/434989/hashmap-intialization-parameters-load-initialcapacity – Raúl Apr 11 '15 at 21:18

3 Answers3

0

From the hash function you have posted is not clear what it returns but looks like it has an issue.

int index = hashFunction(str);

here if your index is not proper than your code is doing a lot of recursive new String[size].Put a counter or debug point here and check.

 if (index > size || table[index] != null) {
                rehash();
            }
rakesh99
  • 1,234
  • 19
  • 34
  • i have created a counter just before rehash(); and the result is 7. do you need any other info? i didn't get what's missing exactly. i have posted my hashfunction in the posts which uses string.hashCode(); with some editing right after. – Kaan Apr 11 '15 at 20:40
  • Your hash function returns a value which is not posted here.Also what was the initial size of your table? If it should not have been rehashed 7 times for the number of Strings you inserted than your hash function has a bug.Otherwise use -Xmx to provide more memory – rakesh99 Apr 11 '15 at 20:47
  • the hash function returns results for over 300.000 words for every word. initial size of my array is 514751. but i don't want any collision to happen. should i try different hash function to fix this problem? – Kaan Apr 11 '15 at 20:49
  • initial size of your array is 514751 !?? and you are putting 300 words and getting out of memory.Your hash function has an issue.Post the code which shows variables s and hash in it. – rakesh99 Apr 11 '15 at 20:54
  • added load() method to my question. – Kaan Apr 11 '15 at 21:03
  • No.I meant the statment 'return hash' in your hash function.See what is the value of hash there.From your snipped it looks you need 'return val'.Also your initial capacity is too high.Are you sure you are running with sufficient memory to allocate million strings?Though that is just a waste here! – rakesh99 Apr 11 '15 at 21:22
  • it is originaly val in my program. edited that in my question. i kept initial size big because it is sure there will be collisions. – Kaan Apr 11 '15 at 21:28
  • The size with which you are initializing is in millions.If you are need only hundreds of string it would be good to bring the initial size low.If the return value is proper a re-hash should never happen for few hundred words unless you are trying to put duplicate strings in which case your code fails and re-hashes .you need additional check str.equals(table[index]) to avoid re-hash – rakesh99 Apr 11 '15 at 21:37
0

No matter how big you make the table you cannot completely avoid collisions. Try this program for example:

System.out.println("Aaa".hashCode());
System.out.println("AbB".hashCode());
System.out.println("BBa".hashCode());
System.out.println("BCB".hashCode());

The output is:

65569
65569
65569
65569

They are four different strings with exactly the same hashcode. Exact collisions of this sort are not even that rare. (The hash algorithm used by the Java String class is not actually a very good one, but it is kept for backwards compatibility reasons.)

So, making the hashtable bigger (using a larger portion of the hashcode) reduces the number of collisions, but will never completely prevent them, because sometimes the hashcodes for different values are exactly the same.

A hashtable must be prepared to deal with a limited number of collisions by being able to store a set of different values in a single slot of the table. This is typically done by using a linked list for values that share the same hashcode. The current implementation of java.util.HashMap does something more advanced: if values with the same hashcode implement the Comparable interface (as String does), it uses that to arrange them in a binary tree. There is also something possible called dynamic perfect hashing, where collisions are prevented by dynamically changing the hash algorithm to ensure each distinct value gets a distinct hash, but that is more complex.

A few other issues I see in your code:

  • There is no need to initialize val with 0 if you immediately assign something else to it on the next line. You can instead do int val; val = s.hashCode(); or simply int val = s.hashCode();.

  • The check: if (val < 0) val *= -1; is not completely reliable because if val is exactly equal to Integer.MIN_VALUE, multiplying it by -1 overflows and produces Integer.MIN_VALUE as the result. To completely prevent negative values, mask out the integer's sign bit by doing val &= Integer.MAX_VALUE;.

  • The condition here is wrong: while (val > this.size) val %= this.size;. It should be val >= this.size. However, there is no need to loop at all. Doing the modulo operation once unconditionally with no while/if is enough. Alternatively if you maintain the table size as an exact power of 2, you can implement the mod operation as: val &= (size - 1);, which is a little faster and will also fulfill the requirement of ensuring the result is non-negative, unlike %.

  • In the insert method it would have to be if (index >= size ..., not if (index > size ..., but actually there is no need for that check at all, if the hash function already ensures the hash is in range.

  • When the table slot is already occupied, you need to check if it already contains the same string you are trying to insert (in which case you can return from the method immediately) and not just assume it's a different value with a collision.

Boann
  • 48,794
  • 16
  • 117
  • 146
-1

From javadoc

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

If you know that map will be used to store N records aprox, a good initialCapacity will be N/.75 + N/10 - considering a variance of 10%.

  • Its OK to get an OutOfMemory error, but its not ok to program to rehash - try best to avoid it.
  • For rehash - you shouldn't wait till collision. From HashMap class,

This (resize) method is called automatically when the number of keys in this map reaches its threshold

where threshold = (int)(capacity * loadFactor);

Raúl
  • 1,542
  • 4
  • 24
  • 37