8

one of the construct method of java.util.concurrent.ConcurrentHashMap:

public ConcurrentHashMap(int initialCapacity) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException();
        int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ?
                   MAXIMUM_CAPACITY :
                   tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1));
        this.sizeCtl = cap;
    }

What does the parameter for method 'tableSizeFor(...)' mean?

initialCapacity + (initialCapacity >>> 1) + 1

I think the parameter should be like :

(int)(1.0 + (long)initialCapacity / LOAD_FACTOR)

or just:

initialCapacity

I think the parameter expression is wrong, at least is a bug.Did I misunderstand something?

I send a bug report to OpenJDK, seems they officially confirmed it is most likely a bug: https://bugs.openjdk.java.net/browse/JDK-8202422

Update: Doug Lea commented on the bug,seems that he agree it is a bug.

Anonemous
  • 307
  • 2
  • 8

2 Answers2

6

I strongly suppose it’s an optimization trick.

You’re on to the correct thought. The constructor you cite uses a the default load factor of 0.75, so to accommodate initialCapacity elements the hash table size needed to be at least

initialCapacity / 0.75

(roughly the same as multiplying by 1.3333333333). However floating-point divisions are expensive (a slight bit, not bad). And we would additionally need to round up to an integer. I guess that an integer division would already help

(initialCapacity * 4 + 2) / 3

(the + 2 is for making sure that the result is rounded up; the * 4 ought to be cheap since it can be implemented as a left shift). The implementors do even better: shifts are a lot cheaper than divisions.

initialCapacity + (initialCapacity >>> 1) + 1

This is really multiplying by 1.5, so is giving us a result that will often be greater than needed, but it’s fast. The + 1 is to compensate for the fact that the “multiplication” rounded down.

Details: the >>> is an unsigned right shift, filling a zero into the leftmost position. Already knowing that initialCapacity was non-negative this gives the same result as a division by 2, ignoring the remainder.

Edit: I may add that tableSizeFor rounds up to a power of 2, so most often the same power of 2 will be the final result even when the first calculation gave a slightly greater result than needed. For example, if you ask for capacity for 10 elements (to keep the calculation simple), table size 14 would be enough, where the formula yields 16. But the 14 would be rounded up to a power of 2, so we get 16 anyway, so in the end there is no difference. If you asked for room for 12 elements, size 16 would still suffice, but the formula yields 19, which is then rounded up to 32. This is the more unusual case.

Further edit: Thank you for the information in the comments that you have submitted this as a JDK bug and for providing the link: https://bugs.openjdk.java.net/browse/JDK-8202422. The first comment by Marin Buchholz agrees with you:

Yes, there is a bug here. The one-arg constructor effectively uses a load-factor of 2/3, not the documented default of 3/4…

I myself would not have considered this a bug unless you regard it as a bug that you occasionally get a greater capacity than you asked for. On the other hand you are right, of course (in your exemplarily terse bug report) that there is an inconsistency: You would expect new ConcurrentHashMap(22) and new ConcurrentHashMap(22, 0.75f, 1) to give the same result since the latter just gives the documented default load factor/table density; but the table sizes you get are 64 from the former and 32 from the latter.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
  • Good explanation.It makes sense if we take the optimization in. But, I still think it is not fully make sense in logic way. This will make people feel about that the size of ConcurrentHashMap is not predictable, unless by debugging the source code or do the math with 1.5 but not 0.75. Really easy to be confused. Anyway,I will accept your answer.thx a lot. – Anonemous Apr 29 '18 at 08:11
  • 1
    I get your point, @Anonemous. However, the option to give a capacity and load factor *is* for optimization only, it’s not there for making sense in a logical way. The functional behaviour of both your maps will be the same. Also the documentation only says that no resizing will be needed for the specified number of elements, which is true for both maps. It’s indirectly saying that no precise size it guaranteed. – Ole V.V. Apr 29 '18 at 08:58
  • 1
    @OleV.V. hmm... not entirely sure this is correct in some parts of the answer. See this https://stackoverflow.com/a/50088948/1059372 – Eugene Apr 29 '18 at 16:38
  • 1
    @OleV.V. yeah, I am not so sure, but maybe this url gave more information about this bug on the comments: https://bugs.openjdk.java.net/browse/JDK-8202422 – Anonemous May 01 '18 at 03:43
  • @OleV.V. The bug is currently assigned to Doug Lea, I am not sure this is really a bug, maybe Doug Lea will gave a reasonable explanation like your answer, I will see, and I will keep on tracking on this bug. – Anonemous May 01 '18 at 03:55
  • 1
    @OleV.V. I will send your answer's url to OpenJDK for reference. Maybe will give them an idea. – Anonemous May 01 '18 at 04:23
  • @OleV.V. that also means that I was sort of right... In the sense that the one arg constructor in wrong ( in my se ond comment to my answer ) – Eugene May 01 '18 at 04:36
  • @OleV.V. check the OpenJDK bug url, your answer is under the bug comment. https://bugs.openjdk.java.net/browse/JDK-8202422. I think they still need evaluate it. – Anonemous May 02 '18 at 14:10
  • 1
    @OleV.V. Doug Lea commented on the bug.Seems he agree that it is a bug. – Anonemous May 29 '18 at 10:45
0

When you say (int)(1.0 + (long)initialCapacity / LOAD_FACTOR), it makes sense for HashMap, not for ConcurrentHashMap (not in the same sense it does for HashMap).

For HashMap, capacity is the number of buckets before a resize happens, for ConcurrentHashMap it's the number of entries before resize is performed.

Testing this is fairly easy:

private static <K, V> void debugResize(Map<K, V> map, K key, V value) throws Throwable {

    Field table = map.getClass().getDeclaredField("table");
    AccessibleObject.setAccessible(new Field[] { table }, true);
    Object[] nodes = ((Object[]) table.get(map));

    // first put
    if (nodes == null) {
        map.put(key, value);
        return;
    }

    map.put(key, value);

    Field field = map.getClass().getDeclaredField("table");
    AccessibleObject.setAccessible(new Field[] { field }, true);
    int x = ((Object[]) field.get(map)).length;
    if (nodes.length != x) {
        ++currentResizeCalls;
    }

}


public static void main(String[] args) throws Throwable {

    // replace with new ConcurrentHashMap<>(1024) to see a different result
    Map<Integer, Integer> map = new HashMap<>(1024);

    for (int i = 0; i < 1024; ++i) {
        debugResize(map, i, i);
    }

    System.out.println(currentResizeCalls);

}

For HashMap, resize happened once, for ConcurrentHashMap it didn't.

And the 1.5 growing is not a new thing at all, ArrayList has the same strategy.

The shifts, well, they are cheap(er) than usual math; but also because >>> is un-signed.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 1
    Capacity indeed means two different things for `HashMap` and `ConcurrentHashMap` as you have explained. Therefore it would make sense for *the caller* of the `HashMap` constructor to apply a formula like desiredMinSize / loadFactor, while it makes sense for the `ConcurrentHashMap` constructor *internally* to apply a similar formula. Which BTW the 3-arg `ConcurrentHashMap` constructor also does, and I quote: `long size = (long)(1.0 + (long)initialCapacity / loadFactor);`. – Ole V.V. Apr 29 '18 at 16:58
  • @OleV.V. read my answer again, and boy it needs refactoring. My idea was that I see it the other way around, shifting is probably the initial/old way to compute the capacity – Eugene Apr 29 '18 at 18:35
  • 1
    @Eugene I reported this bug to OpenJDK, and seems they confirmed it as a bug: https://bugs.openjdk.java.net/browse/JDK-8202422 – Anonemous May 01 '18 at 03:47
  • @Anonemous good! I was hoping it would be that way, hence my comment that I see it the other way around, the on arg constructor should be treated as something older, thus incorrect. – Eugene May 01 '18 at 04:39
  • @Eugene Doug Lea commented on the bug.Seems he agree that it is a bug. – Anonemous May 30 '18 at 05:47