8

Why is calling containsKey on a HashMap slower then get?

Test: http://ideone.com/QsWXF (>15% difference, run on sun-jdk-1.6.0.17)

Stefan
  • 838
  • 3
  • 13
  • 28
  • 1
    Not sure, but it's a pretty sketchy benchmark in terms of what the compiler/JIT will optimize away. – Dave Newton Nov 03 '11 at 17:51
  • 1
    But then the question would be why they dont benefit equally from JIT optimization. – Stefan Nov 03 '11 at 18:01
  • 1
    @Stefan: I know your code is just an example, but if you're planning to really use things like *Map{Integer,Integer}*, then something like Trove's *TIntIntHashMap* (which is entirely implemented using primitives, not objects) will run circles around the default *HashMap* and the improvement shall be something entirely different than a mere 15% ; ) Of course it won't help when you need objects in your collections, but when you have collections of primitives it does really outperforms the default collections. – TacticalCoder Nov 03 '11 at 18:34
  • I dont doubt that. This is not critical i was just curious at some point because i barely ever use contains at all, since i usually need the content anyway. But i wanted to see how much 'faster' it is and if i should consider using it to gain small benefits. – Stefan Nov 03 '11 at 18:52
  • If you have method which doesn't do anything and the result is ignored the JVM can optimise the code in ways which are not realistic. Additionally when you have a long running loop (10000+ iterations), it can trigger the whole method to be compiled, even if the rest of the method hasn't been runs and won't be optimised correctly. Also there have been loads of JVM optimisation between update 18 and update 29. – Peter Lawrey Nov 03 '11 at 22:51
  • The execution path for both methods should be almost identical. The question is not if this is a good benchmark, the question is why is there a difference. Maybe i should have phrased the question that way. – Stefan Nov 04 '11 at 13:48

4 Answers4

11

Because it does [ever so slightly] more work, see the OpenJDK 7 source.


Note that containsKey calls getEntry while get directly "does the magic lookup". I do not know why it was done this way, and am further puzzled by the use/not use of getForNullKey: See John B's and Ted Hopps's comments as to why this is done.

get has an early code split for a null-key (note that get will return null if the entry doesn't exist or existed with a null value stored):

315           if (key == null)
316               return getForNullKey();
...
322               if (e.hash == hash &&
                      ((k = e.key) == key || key.equals(k)))
323                   return e.value;

While getEntry, called from containsKey, does not split to getForNullKey and there is additional work here to check for the null-key case (for each Entry scanned in the chain):

366               if (e.hash == hash &&
367                   ((k = e.key) == key || (key != null && key.equals(k))))
368                   return e;

Also, containsKey has the additional conditional and method call (note that getEntry will return an Entry object, if said key exists, even if the stored value is null):

352           return getEntry(key) != null;

I suppose it could be argued that containsKey would benefit - in terms of "performance" - from having a specialized form (at the expense of less-DRY code), or that getEntry could follow the lead of get with an early null-key check .. on the other-hand, it might be argued that get should be written in terms of getEntry ;-)

Happy coding.

  • 1
    That seems so odd since you could insert a value of `null` into a map and the key would exist and this method would return `false`. – John B Nov 03 '11 at 17:53
  • 4
    It was done that way to support null values. (`map.containsKey(key)` is not equivalent to `map.get(key) == null`.) – Ted Hopp Nov 03 '11 at 17:56
3

I haven't tried to reproduce your results yet, but my first guess is that get simply returns the value (if any) that was found, while containsKey (which is what you tested, not contains) needs to test whether the key exists (with either a null or non-null value) and then return a boolean. Just a little more overhead involved.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • 3
    No, it doesn't need to check whether the value is null - because in that case it should *still* return true. It should *just* be checking whether the entry exists. – Jon Skeet Nov 03 '11 at 17:56
  • @Jon - Right you are. (That's what I meant, of course. ;-)) Amending my language. – Ted Hopp Nov 03 '11 at 18:35
  • But I don't think I'd say it's more work - `contains` *only* needs to test for the existence of the entry, rather than fetching the value *from* the entry. `get` still has to test for the existence of the entry too... – Jon Skeet Nov 03 '11 at 18:49
  • @Jon That was actually my thought that contains would be 'less' work since i only need a Yes/No answer. – Stefan Nov 03 '11 at 18:54
  • @Jon - Perhaps it's just that the way `containsKey` is implemented involves an extra method call (versus an extra field access in `get`). `get` does not do an explicit test that the key exists; it just falls out of the loop the same way that `containsKey` does. See the code posted by Mehrdad. – Ted Hopp Nov 03 '11 at 19:01
3

Let's see the source code:

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    int hash = hash(key.hashCode());
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }
    return null;
}


public boolean containsKey(Object key) {
    return getEntry(key) != null;
}

final Entry<K,V> getEntry(Object key) {
    int hash = (key == null) ? 0 : hash(key.hashCode());
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

Maybe it's because of the extra method call or because of the repeated check for key != null?

user541686
  • 205,094
  • 128
  • 528
  • 886
3

Testing different sizes of hashmaps, if there is a bias in performance, its very small.

Running on Java 7 update 1 with a 4.6 GHz i7 2600K.

public class HashMapPerfMain {
    public static void main(String... args) {
        Integer[] keys = generateKeys(2 * 1000 * 1000);

        Map<Integer, Boolean> map = new HashMap<Integer, Boolean>();
        for (int j = 0; j < keys.length; j += 2)
            map.put(keys[j], true);

        for (int t = 0; t < 5; t++) {
            long start = System.nanoTime();
            int count = countContainsKey(map, keys);
            long time = System.nanoTime() - start;
            assert count == keys.length / 2;

            long start2 = System.nanoTime();
            int count2 = countGetIsNull(map, keys);
            long time2 = System.nanoTime() - start2;
            assert count2 == keys.length / 2;
            System.out.printf("Map.containsKey avg %.1f ns, ", (double) time / keys.length);
            System.out.printf("Map.get() == null avg %.1f ns, ", (double) time2 / keys.length);
            System.out.printf("Ratio was %.2f%n", (double) time2/ time);
        }
    }

    private static int countContainsKey(Map<Integer, Boolean> map, Integer[] keys) {
        int count = 0;
        for (Integer i : keys) {
            if (map.containsKey(i)) count++;
        }
        return count;
    }

    private static int countGetIsNull(Map<Integer, Boolean> map, Integer[] keys) {
        int count = 0;
        for (Integer i : keys) {
            if (map.get(i) == null) count++;
        }
        return count;
    }

    private static Integer[] generateKeys(int size) {
        Integer[] keys = new Integer[size];
        Random random = new Random();
        for (int i = 0; i < keys.length; i++)
            keys[i] = random.nextInt();
        return keys;
    }
}

prints for half million keys

Map.containsKey avg 27.1 ns, Map.get() == null avg 26.4 ns, Ratio was 0.97
Map.containsKey avg 19.6 ns, Map.get() == null avg 19.6 ns, Ratio was 1.00
Map.containsKey avg 18.3 ns, Map.get() == null avg 19.0 ns, Ratio was 1.04
Map.containsKey avg 18.2 ns, Map.get() == null avg 19.1 ns, Ratio was 1.05
Map.containsKey avg 18.3 ns, Map.get() == null avg 19.0 ns, Ratio was 1.04

prints for one million keys

Map.containsKey avg 30.9 ns, Map.get() == null avg 30.9 ns, Ratio was 1.00
Map.containsKey avg 26.0 ns, Map.get() == null avg 25.5 ns, Ratio was 0.98
Map.containsKey avg 25.0 ns, Map.get() == null avg 24.9 ns, Ratio was 1.00
Map.containsKey avg 25.0 ns, Map.get() == null avg 24.9 ns, Ratio was 1.00
Map.containsKey avg 24.8 ns, Map.get() == null avg 25.0 ns, Ratio was 1.01

however for two million keys

Map.containsKey avg 36.5 ns, Map.get() == null avg 36.7 ns, Ratio was 1.00
Map.containsKey avg 34.3 ns, Map.get() == null avg 35.1 ns, Ratio was 1.02
Map.containsKey avg 36.7 ns, Map.get() == null avg 35.1 ns, Ratio was 0.96
Map.containsKey avg 36.3 ns, Map.get() == null avg 35.1 ns, Ratio was 0.97
Map.containsKey avg 36.7 ns, Map.get() == null avg 35.2 ns, Ratio was 0.96

for five million keys

Map.containsKey avg 40.1 ns, Map.get() == null avg 40.9 ns, Ratio was 1.02
Map.containsKey avg 38.6 ns, Map.get() == null avg 40.4 ns, Ratio was 1.04
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.3 ns, Ratio was 0.97
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.3 ns, Ratio was 0.98
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.8 ns, Ratio was 0.99

BTW: The time complexity for get() and containsKey is O(1) (on an idealized machine), but you can see that for a real machine, the cost increases with the size of the Map.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Your countGetIsNull performs an extra operation. A similar operation is the difference between get and contains based on the source. So this little bit of extra work is skewing your results. I am not trying to emulate contains with get, all i was wondering is why contains would be slower at all, even considering JIT optimizations. The execution path for both should be almost identical and therefore benefit from the JIT in a similar way and still there is a 15% difference. – Stefan Nov 04 '11 at 13:46
  • The average ratio is 0.93 making a 7% difference. The extra check appears to match the difference. This is fairly small compared with the 8x difference in timings. (200 ns vs 25 vs) I suspect part of that is due to having an old version of the JVM. Java 6 update 17 was released 2009-11-04. – Peter Lawrey Nov 04 '11 at 15:09
  • 1
    I actually use an IBM JDK here at work and suprisingly Contains is faster the Get. (not by much tho) – Stefan Nov 04 '11 at 15:46
  • Can you try Oracle JDK 7 update 1 or Java 6 update 29 for comparison? – Peter Lawrey Nov 04 '11 at 16:24