17

As per below code, hashmap initial dafault capacity is 16 and LF is 0.75, so when I enter 13th element then rehashing should occur and because I have provided a constant hashcode, it internally maintain a linked list to maintain an insertion order. So, till 10th elements it is working as expected but when I enter 11th element, it shuffle the order. Can anybody help me in understanding why it is happening at the time of 11th element insertion only.

class A{
    int a;

    A(int a){
        this.a = a;
    }
    @Override
    public int hashCode() {
        return 7;
    }
    @Override
    public String toString() {
        return "" + a + "";
    }
}
class Base {
    public static void main(String[] args) {
        Map<Object, Integer> m = new HashMap<Object, Integer>();
        m.put(new A(1), 1);
        m.put(new A(2), 1);
        m.put(new A(3), 1);
        m.put(new A(4), 1);
        m.put(new A(5), 1);
        m.put(new A(6), 1);
        m.put(new A(7), 1);
        m.put(new A(8), 1);
        m.put(new A(9), 1);
        m.put(new A(10), 1);
        //m.put(new A(11), 1);
        System.out.println(m);
    }
}

Output which I am getting till 10th element:

{1=1, 2=1, 3=1, 4=1, 5=1, 6=1, 7=1, 8=1, 9=1, 10=1}

Output which I am getting after entering 11th element:

{4=1, 1=1, 2=1, 3=1, 5=1, 6=1, 7=1, 8=1, 9=1, 10=1, 11=1}

It should maintain insertion order or if it is using RB tree internally so which traversal it is using here in this case?

Vinay Prajapati
  • 7,199
  • 9
  • 45
  • 86
Vi_Code
  • 181
  • 8

2 Answers2

22

It should maintain insertion order or if it is using RB tree internally so which traversal it is using here in this case?

There’s no “should”; the HashMap does not guaranty any order. What actually happens in the current implementation, is determined by two constants, TREEIFY_THRESHOLD = 8 and MIN_TREEIFY_CAPACITY = 64.

When the number of items in one bucket exceeds the former, the bucket will be converted into a tree of nodes, unless the map’s total capacity is smaller than the latter constant, in that case, the capacity will be doubled.

So when you insert the 9th object, the capacity will be raised from 16 to 32, inserting the 10th causes a raise from 32 to 64, then, inserting the 11th element will cause a conversion of the bucket to a tree.

This conversion will always happen, whether there’s an actual benefit or not. Since the objects have identical hash codes and do not implement Comparable, it will end up using their identity hash codes for determining the order. This may result in a change of the order (in my environment, it does not).

Naman
  • 27,789
  • 26
  • 218
  • 353
Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    Also worth stressing that all this behaviour is an implementation detail, not part of the public interface.  So it's subject to change, and *your code should not assume or rely on it*! – gidds Feb 15 '19 at 17:33
  • 1
    Thanks..@Holger So it will be converted into tree at 11th element not at 9th right? – Vi_Code Feb 15 '19 at 17:43
  • 1
    @gidds of course. It’s implementation specific and just for this specific corner case. That’s why my answer starts with “*the HashMap does not guaranty any order*”. – Holger Feb 18 '19 at 06:52
6

It's independent of hashcode number specified i.e. 7 and rather your hascode being constant causing it. Below is why:

I went through the source code of HashMap's put method and there is a constant TREEIFY_THRESHOLD that decides when to convert a normal bucket to a tree.

static final int TREEIFY_THRESHOLD = 8;

The code snippet from put method is below (Put method calls putVal method):

.
.
.

                for (int binCount = 0; ; ++binCount) {

                    if ((e = p.next) == null) {

                        p.next = newNode(hash, key, value, null);

                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st

                            treeifyBin(tab, hash);

                        break;

                    }
.
.
.

Take a note of line that contains if (binCount >= TREEIFY_THRESHOLD - 1) condition. As soon as it finds that a bucket is filled upto TREEIFY_THRESHOLD capacity, it calls treeifyBin() method.

This method in turn calls the resize() method only when MIN_TREEIFY_CAPACITY is met.

 final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;

            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

Look up for following condition in snippet above

if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();

The resize method then increase the size of the map accordingly based on multiple conditional checks it have. It basically increases capacity by load factor.

Just like treeify if no. of elements in a tree reduce. Untreeify operation is performed using UNTREEIFY_THRESHOLD i.e. 6 as base.

I referenced this link to go through the Hashmap code.

Vinay Prajapati
  • 7,199
  • 9
  • 45
  • 86
  • 1
    Yes..completely agree..Are you satisfied with above answer given by Holger? – Vi_Code Feb 15 '19 at 17:45
  • 1
    Yes! Except I am in doubt about doubling of capacity. I didn't go through resize code much further. But I think it should be load_factor*capacity and not 2*capacity. – Vinay Prajapati Feb 16 '19 at 01:54
  • 1
    @VinayPrajapati on what base do you think that? The purpose of the load factor is well specified in [the API documentation](https://docs.oracle.com/javase/8/docs/api/?java/util/HashMap.html): “*The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased*”. It even says, “*When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed … so that the hash table has approximately twice the number of buckets*”. In this implementation, it’s exactly twice. – Holger Feb 18 '19 at 06:56