Why is calling containsKey
on a HashMap slower then get
?
Test: http://ideone.com/QsWXF (>15% difference, run on sun-jdk-1.6.0.17)
Why is calling containsKey
on a HashMap slower then get
?
Test: http://ideone.com/QsWXF (>15% difference, run on sun-jdk-1.6.0.17)
Because it does [ever so slightly] more work, see the OpenJDK 7 source.
Note that containsKey
calls getEntry
while get
directly "does the magic lookup". I do not know why it was done this way, and am further puzzled by the use/not use of See John B's and Ted Hopps's comments as to why this is done.getForNullKey
:
get
has an early code split for a null-key (note that get
will return null
if the entry doesn't exist or existed with a null value stored):
315 if (key == null)
316 return getForNullKey();
...
322 if (e.hash == hash &&
((k = e.key) == key || key.equals(k)))
323 return e.value;
While getEntry
, called from containsKey
, does not split to getForNullKey
and there is additional work here to check for the null-key case (for each Entry scanned in the chain):
366 if (e.hash == hash &&
367 ((k = e.key) == key || (key != null && key.equals(k))))
368 return e;
Also, containsKey
has the additional conditional and method call (note that getEntry
will return an Entry object, if said key exists, even if the stored value is null
):
352 return getEntry(key) != null;
I suppose it could be argued that containsKey
would benefit - in terms of "performance" - from having a specialized form (at the expense of less-DRY code), or that getEntry
could follow the lead of get
with an early null-key check .. on the other-hand, it might be argued that get
should be written in terms of getEntry
;-)
Happy coding.
I haven't tried to reproduce your results yet, but my first guess is that get
simply returns the value (if any) that was found, while containsKey
(which is what you tested, not contains
) needs to test whether the key exists (with either a null or non-null value) and then return a boolean. Just a little more overhead involved.
Let's see the source code:
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
public boolean containsKey(Object key) {
return getEntry(key) != null;
}
final Entry<K,V> getEntry(Object key) {
int hash = (key == null) ? 0 : hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
Maybe it's because of the extra method call or because of the repeated check for key != null
?
Testing different sizes of hashmaps, if there is a bias in performance, its very small.
Running on Java 7 update 1 with a 4.6 GHz i7 2600K.
public class HashMapPerfMain {
public static void main(String... args) {
Integer[] keys = generateKeys(2 * 1000 * 1000);
Map<Integer, Boolean> map = new HashMap<Integer, Boolean>();
for (int j = 0; j < keys.length; j += 2)
map.put(keys[j], true);
for (int t = 0; t < 5; t++) {
long start = System.nanoTime();
int count = countContainsKey(map, keys);
long time = System.nanoTime() - start;
assert count == keys.length / 2;
long start2 = System.nanoTime();
int count2 = countGetIsNull(map, keys);
long time2 = System.nanoTime() - start2;
assert count2 == keys.length / 2;
System.out.printf("Map.containsKey avg %.1f ns, ", (double) time / keys.length);
System.out.printf("Map.get() == null avg %.1f ns, ", (double) time2 / keys.length);
System.out.printf("Ratio was %.2f%n", (double) time2/ time);
}
}
private static int countContainsKey(Map<Integer, Boolean> map, Integer[] keys) {
int count = 0;
for (Integer i : keys) {
if (map.containsKey(i)) count++;
}
return count;
}
private static int countGetIsNull(Map<Integer, Boolean> map, Integer[] keys) {
int count = 0;
for (Integer i : keys) {
if (map.get(i) == null) count++;
}
return count;
}
private static Integer[] generateKeys(int size) {
Integer[] keys = new Integer[size];
Random random = new Random();
for (int i = 0; i < keys.length; i++)
keys[i] = random.nextInt();
return keys;
}
}
prints for half million keys
Map.containsKey avg 27.1 ns, Map.get() == null avg 26.4 ns, Ratio was 0.97
Map.containsKey avg 19.6 ns, Map.get() == null avg 19.6 ns, Ratio was 1.00
Map.containsKey avg 18.3 ns, Map.get() == null avg 19.0 ns, Ratio was 1.04
Map.containsKey avg 18.2 ns, Map.get() == null avg 19.1 ns, Ratio was 1.05
Map.containsKey avg 18.3 ns, Map.get() == null avg 19.0 ns, Ratio was 1.04
prints for one million keys
Map.containsKey avg 30.9 ns, Map.get() == null avg 30.9 ns, Ratio was 1.00
Map.containsKey avg 26.0 ns, Map.get() == null avg 25.5 ns, Ratio was 0.98
Map.containsKey avg 25.0 ns, Map.get() == null avg 24.9 ns, Ratio was 1.00
Map.containsKey avg 25.0 ns, Map.get() == null avg 24.9 ns, Ratio was 1.00
Map.containsKey avg 24.8 ns, Map.get() == null avg 25.0 ns, Ratio was 1.01
however for two million keys
Map.containsKey avg 36.5 ns, Map.get() == null avg 36.7 ns, Ratio was 1.00
Map.containsKey avg 34.3 ns, Map.get() == null avg 35.1 ns, Ratio was 1.02
Map.containsKey avg 36.7 ns, Map.get() == null avg 35.1 ns, Ratio was 0.96
Map.containsKey avg 36.3 ns, Map.get() == null avg 35.1 ns, Ratio was 0.97
Map.containsKey avg 36.7 ns, Map.get() == null avg 35.2 ns, Ratio was 0.96
for five million keys
Map.containsKey avg 40.1 ns, Map.get() == null avg 40.9 ns, Ratio was 1.02
Map.containsKey avg 38.6 ns, Map.get() == null avg 40.4 ns, Ratio was 1.04
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.3 ns, Ratio was 0.97
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.3 ns, Ratio was 0.98
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.8 ns, Ratio was 0.99
BTW: The time complexity for get() and containsKey is O(1) (on an idealized machine), but you can see that for a real machine, the cost increases with the size of the Map.