5

Can you please explain this code snippet from HashMap constructor specifically the line

capacity <<= 1:

// Find a power of 2 >= initialCapacity
198         int capacity = 1;
199         while (capacity < initialCapacity)
200             capacity <<= 1;
Geek
  • 26,489
  • 43
  • 149
  • 227

4 Answers4

11

It is equivalent to capacity = capacity << 1;.
That operation shifts capacity's bits one position to the left, which is equivalent to multiplying by 2.

The specific code you posted finds the smallest power of 2 which is larger than initialCapacity.

So if initialCapacity is 27, for example, capacity will be 32 (2^5) after the loop.

assylias
  • 321,522
  • 82
  • 660
  • 783
  • 1
    Not strictly equivalent, though. Different precedence rules may apply. – Romain Aug 22 '12 at 11:16
  • @assylias but why a power of 2 ? if we take division method for hashing and we take mod of a power of 2(say 2^w) , we don't even consider right most w bits I think ... – Geek Aug 22 '12 at 11:19
  • @Geek You would be right except HashMap also uses a rehashing method called `hash(int)` which computes `h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4);` ;) – Peter Lawrey Aug 22 '12 at 11:20
  • @StephenC HashMap always uses a power of 2 for its capacity. – Peter Lawrey Aug 22 '12 at 11:22
  • @StephenC what do you mean by "the "power of 2" bit is AN EXAMPLE." . It seems from the code that HashMap ALWAYS will have a power of 2 as its capacity irrespective of is load factor . – Geek Aug 22 '12 at 11:22
  • @PeterLawrey whats going on in that fancy mehthod ? I am poor in bit manipulation .... – Geek Aug 22 '12 at 11:23
  • @Geek the method finds the smallest power of 2 that is larger than initialCapacity. So if initialCapacity = 27 for example, capacity will be 32 (2^5) after that loop. – assylias Aug 22 '12 at 11:24
  • @assylias I see now what it is doing , but again why are they going for this power of 2 approach which seems counter intuitive to me ...How this double hashing is solving the problem of taking all bits into consideration . – Geek Aug 22 '12 at 11:27
  • @Geek Simply put, the hash method pull down bits from the 20th and 12th and then the 7th and 4th for a combined effect of taking bits from the 4, 7, 12, 16, 19, 20, 24 and 27th bits. i.e. jumbles them all up. In the case of the default size, 16, the top bit is ignored, but for a size of 32+ all the bits are used. – Peter Lawrey Aug 22 '12 at 11:29
  • 1
    @Geek The logic is that doing a rehash and a bit mask and is faster than using `%` and handling the sign (which involves a branch) – Peter Lawrey Aug 22 '12 at 11:30
  • @PeterLawrey thanks for this explanation . I see now the intuition for going for 2^w approach . – Geek Aug 22 '12 at 11:31
  • 1
    @Geek I suspect avoiding the branch prediction misses is the main gain. ;) – Peter Lawrey Aug 22 '12 at 11:33
  • @Peter, the branch is easily removed by `hash&Integer.MAX_VALUE`. mod is just an expensive operation (like 30 times more expensive than bitwise AND and uses an extra CPU register -- also taxing). The latter - bitwise and Integer.MAX_VALUE shall always be applied for hash index into an array with the top bit reserved for 'state'. I know of no hashtable in java that uses 32 bits of the provided `hashCode()` – bestsss Aug 25 '12 at 10:35
  • @Geek, the main reason is that mod is quite expensive compared to bitwise and. – bestsss Aug 25 '12 at 10:36
  • 1
    @Peter, Using mod is justified with prime number sized table since it reduces collisions. Collision reduction is improved by the very bit scramble you explain. Collisions in java.util.HashMap are sorta expensive since they lead to a cache-miss... open address table is usually better with pow2 sized table and collisions are cheaper. Overall like I have stated multiple times java.util.HashMap blows hard – bestsss Aug 25 '12 at 10:42
4

Just like var += 1 is about equivalent to var = var + 1, what you see here (var <<= 1) is about equivalent to var = var << 1, which is "set var to be the result of a 1-position binary left-shift of var."

In this very specific case, it's actually a slightly (runtime) faster way of expressing capacity *= 2 (because a bitwise left-shift of 1 position is equivalent to a multiplication by 2).

Romain
  • 12,679
  • 3
  • 41
  • 54
3

It is equivalent of

capacity = capacity << 1;

which shifts bits in capacity one position to the left (so i.e. 00011011 becomes 00110110).

Kuba Wyrostek
  • 6,163
  • 1
  • 22
  • 40
0

every time this comes out of the loop the value of 'capacity' goes 2 raised by a power higher.

like initially it is 1 i.e.2^0; the operation(capacity <<= 1) for the first time makes it 2^1 and then 2^2 and so on. May be you would like to see the more on it at http://www.tutorialspoint.com/java/java_basic_operators.htm

Nitish Pareek
  • 2,453
  • 3
  • 19
  • 18