Performance for HashMap when Key is Guaranteed Unique

Question

If the keys I wish to use are guaranteed to be unique (or at least the assumption can be made that the keys are unique), does using a 'vanilla' ConcurrentHashMap provide the best performance, or does a hashing function or put method need to be modified to avoid needless hashing?

Also, does a numeric key have any performance benefit over a non-numeric key (such as a String or POJO with a proper hashing function)?

If you don't need the thread-safe aspect, don't use ConcurrentHashMap, use a HashMap — John Vint, Jul 12 '11 at 12:44

score 8 · Accepted Answer · answered Jul 12 '11 at 12:45

As already mentioned in the comments, if you don't need the thread-safe aspects then don't use ConcurrentHashMap.

If you want the absolute best performance consider interning your keys and using an IdentityHashMap. This avoids calculating the hash of the object (and, as mentioned in the comments, negates the need for equals to be evaluated) and instead assumes that the reference itself is the hash.

Note obviously that you've got to make sure that two instances of the same key are the same object (e.g. you have to ensure reference equality, not just object equality). Interning all your keys is one approach for achieving this.

Implementation note: This is a simple linear-probe hash table, as described for example in texts by Sedgewick and Knuth. The array alternates holding keys and values. (This has better locality for large tables than does using separate arrays.) For many JRE implementations and operation mixes, this class will yield better performance than HashMap (which uses chaining rather than linear-probing).

If you know all the keys, perhaps you could also consider perfect hashing? Or map to a simple array structure?

It also avoid the equals() method. To perhaps a lookup you have to use exactly the same object. — Peter Lawrey, Jul 12 '11 at 12:46
@Peter, good point, updated the answer to reflect your comments. — Jeff Foster, Jul 12 '11 at 12:48

Peter Lawrey · Answer 2 · 2011-07-12T13:16:13.390

1

ConcurrentHashMap is the most expensive of the HashMap implementations, this is becuase it is thread safe.

All Maps must have unique keys so this is a given.

Using numbers has a performance advantage if you use a collection which supports primtives like TLongHashMap, however you may be able to go much faster using a custom hash map.

From http://vanillajava.blogspot.com/2011/07/low-gc-in-java-using-primitives.html

Test                                    Performance Memory used
Use Integer wrappers and HashMap        71 - 134 (ns)   53 MB/sec
Use int primitives and HashMap          45 - 76 (ns)    36 MB/sec
Use int primitives and FastMap          58 - 93 (ns)    28 MB/sec
Use int primitives and TIntIntHashMap   18 - 28 (ns)    nonimal
Use int primitives and simple hash map   6 - 9 (ns)     nonimal

edited Jul 12 '11 at 13:16

answered Jul 12 '11 at 12:46

Peter Lawrey

525,659
79
751
1,130

"nonimal" means less than 0.1 MB in two minutes. i.e. less than I thought worth measuring. For this test, the map reaches a specific size almost immediately and doesn't grow after that. – Peter Lawrey Jul 12 '11 at 13:15
1

@Peter Lawrey I think you mean "[nominal](http://dictionary.reference.com/browse/nominal)" – matt b Jul 12 '11 at 13:59
1

In other words ... there is no such animal as a nonimal :-) – Stephen C Jul 12 '11 at 14:48

Stephen C · Answer 3 · 2011-07-12T14:41:14.520

If the keys I wish to use are guaranteed to be unique (or at least the assumption can be made that the keys are unique), does using a 'vanilla' ConcurrentHashMap provide the best performance,

You would typically use ConcurrentHashMap if the Map is a potential concurrency bottleneck. If your application is single threaded or if there is no contention, ConcurrentHashMap is slower than HashMap.

or does a hashing function or put method need to be modified to avoid needless hashing?

The hash function gets evaluated once per "probe" of the hash table; e.g. once per get or put operation. You can reduce the cost of the hash function by caching the result, but this costs you an extra 4 bytes of storage per key object. Whether caching is a worthwhile optimization depends on:

what the relative cost of hashing is compared with the rest of the application, and
the proportion of calls to hashCode() that will actually make use of the cached value.

Both of these factors are highly application specific.

(Incidentally, the long term cost of using the identity hashcode as the hash value is also an extra 4 bytes of storage.)

Also, does a numeric key have any performance benefit over a non-numeric key (such as a String or POJO with a proper hashing function)?

The hash function is likely to be cheaper in the numeric case, but whether it is worth it depends on whether there are application-specific downsides of using a numeric key. And, as above, the relative costs are application specifics. For instance, the cost of String.hashCode() is proportional to the length of the string being hashed.

score 0 · Answer 4 · answered Jul 12 '11 at 12:49

Java's HashMaps are eventually backed by an array of Entry<K,V> where the hashcode of K is used to determine the slot in the array that the Entry is stored in.

The size of the array used (typically starts at 16) is much smaller than the number of possible hashcodes (2^32 ~= 4 billion), so there are bound to be collisions in this array, even if the hashcodes are unique.

So long as your hashcode() method is fast, there is no difference between the types that are used as the Key. Remember that the hashcode() method may be called lots of times, so if it is slow you can cache it internally in the object.

score 0 · Answer 5 · answered Jul 15 '11 at 11:05

i have ConcurrentHashMap instance map which access by multithread.seeing below code snippet. how about these?

Iterator<String> it = new TreeSet<String>(map.keySet()).iterator();
            while(it.hasNext())
            {
                id = it.next();
                synchronized(map)
                {
                    msg = map.get(id);
                    if(msg != null)
                        map.remove(id);
                }
                if(msg != null)
                listener.procMessage(msg);
            }

Performance for HashMap when Key is Guaranteed Unique

5 Answers5