4

I am trying to write demo code to show rehashing is happening in Hashmap when the map size exceeds the load factor threshold. How can I prove rehashing is happening internally . Also I want to prove that eventhough the old entries are moved to new buckets during rehash , I can get the old elements using the old key(Let me know my assumption is correct). Below the sample code.

import java.util.*;

    class RehashDemo{

        public static void main(String[] args){
            Map<Integer,String> numbers = new HashMap<>(10);
            for(int i = 0; i<10;i++){
                numbers.put(i,i+"");
            }
            System.out.println(numbers);

            for(int j = 15; j<=20;j++){
                numbers.put(j,j+"");
            }
            System.out.println(numbers);

        }


    }
JavaUser
  • 25,542
  • 46
  • 113
  • 139
  • why not look at the source code? – Sharon Ben Asher Apr 22 '19 at 07:09
  • The structures don't "change"! They are thrown away and the exact same structure recreated, just bigger. This is an implementation detail and now visible from the outside. – Boris the Spider Apr 22 '19 at 07:09
  • If we increase the bucket size then there will be chance for change in hashcode . Am I correct? – JavaUser Apr 22 '19 at 07:11
  • You can run your code with a debugger to see that some of the entries are moved to new buckets after the HashMap is rehashed. – Eran Apr 22 '19 at 07:11
  • @Eran , but I need a simple program to explain this. – JavaUser Apr 22 '19 at 07:11
  • @JavaUser I don't think you can demonstrate it with the output of a program, since it's an implementation detail. – Eran Apr 22 '19 at 07:12
  • @Eran , during rehash the hashcode will change for the old entries . If YES then how the old value will be retrived using the new hashcode? – JavaUser Apr 22 '19 at 07:16
  • 3
    @JavaUser the hash code doesn't change. It just gets remapped to a (possibly) different bucket, since the bucket is determined using modulo the number of buckets. So hash code 17 will originally get mapped to bucket 1, but after the first rehash it will be mapped to bucket 17. – Eran Apr 22 '19 at 07:18

1 Answers1

6

It's not difficult to write a program to demonstrate rehashing, but you have to understand a lot about HashMap's internal organization, how objects' hashcodes are generated, how hashcodes are related to HashMap's internal structures, and how this affects iteration order.

Briefly, HashMap consists of an array of buckets (the "table"). Each bucket is a linked list of key-value pairs. Adding a pair whose key hashes to a bucket that's already occupied is added to the end of the linked list for that bucket. The bucket is determined by calling the key's hashCode() method, XORing it with the its high order 16 bits right-unsigned-shifted by 16 (see source), and then taking the modulus of the table size. Since the table size is always a power of two, this is essentially ANDing with a mask of (tablesize-1). The hash code of an Integer object is simply its integer value. (source). Finally, the iteration order of a HashMap steps through each bucket sequentially, and also sequentially through the linked list of pairs within each bucket.

After all that, you can see that small integer values will end up in corresponding buckets. For example, Integer.valueOf(0).hashCode() is 0. It will remain 0 after shift-and-XOR, and modulus any table size will remain 0. Thus, Integer 0 ends up in bucket 0, Integer 1 ends up in bucket 1, and so forth. But don't forget that the bucket is modulo the table size. So if the table size is 8, Integer 8 will end up in bucket 0.

With this information, we can populate a HashMap with Integer keys that will end up in predictable buckets. Let's create a HashMap with a table size of 8 and a default load factor of 0.75, meaning that we can add six mappings before rehashing occurs.

Map<Integer, Integer> map = new HashMap<>(8);
map.put(0, 0);
map.put(8, 8);
map.put(1, 1);
map.put(9, 9);
map.put(2, 2);
map.put(10, 10);

{0=0, 8=8, 1=1, 9=9, 2=2, 10=10}

Printing out the map (essentially, using its toString() method) iterates the map sequentially as described above. We can see that 0 and 8 end up in the first bucket, 1 and 9 in the second, and 2 and 10 in the third. Now let's add another entry:

map.put(3, 3);

{0=0, 1=1, 2=2, 3=3, 8=8, 9=9, 10=10}

The iteration order changed! Adding the new mapping exceeded the threshold for rehashing, so the table size was doubled to 16. Rehashing was done, this time with a modulus of 16 instead of 8. Whereas 0 and 8 were both in bucket 0 before, now they're in separate buckets, since there are twice as many buckets available. Same with 1/9 and 2/10. The second entry in each bucket with the old table size of 8 now hashes to its own bucket when the table size is 16. You can see this, since the iteration proceeds sequentially through the buckets, and there is now one entry in each bucket.

Of course, I chose the integer values carefully such that collisions occur with the table size of 8 and do not occur with a table size of 16. That lets us see the rehashing very clearly. With more typical objects, the hash codes (and thus the buckets) are harder to predict, so it's harder to see the collisions and what gets shifted around when rehashing occurs.

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259