1

Until now, I know that after rehashing in a HashMap, all the entries are rehashed with the new table length. But I want to know what will happen when I have collisions.

e.g.

Map<String, String> map = new HashMap<>(5); 
map.put("a", "ape");
map.put("b", "bird"); 
map.put("c", "chicken");

Suppose they have different hashcodes, but "b" and "c" are stored in the same bucket after the internal hashing.

Now I'll insert a fourth entry to reach the load factor therefore rehashing the table:

map.put("d", "dynamite");

Could the entries with collisions be stored in separate buckets or they always will be together (in reverse order according of what I've read)?.

I suppose that the answer to the title is no, because I will get the same internal hashing for "b" and "c", but I'm not sure.

emer
  • 55
  • 1
  • 7

2 Answers2

1

They could be stored in the same bucket or in different buckets based on whether the number represented by the expresssion hashcode % capacity remains the same post rehashing or not.

E.g. let's say the hashCodes returned by the String objects "b" and "c" are 27 and 32. Your initial capacity is 5. So the expression hashcode % capacity equates to 2 and 2 for both "a" and "b". Therefore they both will be stored in the same bucket. Now after the rehashing (when the number of entries in the hash table exceeds the product of the load factor and the current capacity), the new capacity approximately doubles. Let's say the new capacity is 10. So the expression hashcode % capacity will now equate to 7 and 2 respectively. This means that the 2 objects now will be stored in separate buckets post rehashing.

Now consider the following case. Say, the hashCodes returned by the 2 objects is 27 and 37 instead. In this case, the expression hashcode % capacity equates to 2 and 2 before hashing and 7 and 7 after hashing. So they will still be stored in the same bucket.

VHS
  • 9,534
  • 3
  • 19
  • 43
  • Yes, it makes more sense and it's clearer for me using the modulo. You and @Michał Kosmulski gave me two examples with the mod operator. But, Java Collections Framework uses instead bit masking. Reflecting on this I'll assume that gives the same result, and it's a matter of implementation – emer Mar 08 '16 at 22:29
1

There are two ways you can view collisions here.

One is two objects returning the same value from hashCode() method. In this case, they will end up in the same bucket no matter what size the hashtable array is.

The other case is when two objects have different hash codes but end up in the same bucket due to the array size being less than the 232 unique values that hashCode() can in theory return. Usually, the raw hash code value will be taken modulo array size and that is used to find the right bucket for an entry. Suppose the initial array size is 16 and you have object A with hash code 3 and object B with hash code 19. Since 19 % 16 == 3, object A and object B will end up in the same bucket. If you now resize the array to 18, object A will end up in bucket 3 % 20 == 3 but object B will end up in bucket 19 % 20 == 19. So now they are in different buckets which answers the question posed in the title with a "yes".

Michał Kosmulski
  • 9,855
  • 1
  • 32
  • 51
  • Thanks, just to be succinct ¿bit masking and mod give the same result? – emer Mar 08 '16 at 23:03
  • 1
    @EMER Yes, suppose we used shorter hashes, just 5 bits. Object A has hash code (binary) equal to 11000, object B has hash code 10000. For an array of length 8, we use a bit mask of 3 bits, resulting in bucket 000 in both cases. If we increase array size to 16 and use 4 bits for the mask, object A will be in bucket 1000 but object B will be in 0000: so they end up in different buckets with the larger array. – Michał Kosmulski Mar 09 '16 at 09:08