84

How is the performance of ConcurrentHashMap compared to HashMap, especially .get() operation (I'm especially interested for the case of only few items, in the range between maybe 0-5000)?

Is there any reason not to use ConcurrentHashMap instead of HashMap?

(I know that null values aren't allowed)

Update

just to clarify, obviously the performance in case of actual concurrent access will suffer, but how compares the performance in case of no concurrent access?

Mauli
  • 16,863
  • 27
  • 87
  • 114

8 Answers8

111

I was really surprised to find this topic to be so old and yet no one has yet provided any tests regarding the case. Using ScalaMeter I have created tests of add, get and remove for both HashMap and ConcurrentHashMap in two scenarios:

  1. using single thread
  2. using as many threads as I have cores available. Note that because HashMap is not thread-safe, I simply created separate HashMap for each thread, but used one, shared ConcurrentHashMap.

Code is available on my repo.

The results are as follows:

  • X axis (size) presents number of elements written to the map(s)
  • Y axis (value) presents time in milliseconds

Add method Get method Remove method

The summary

  • If you want to operate on your data as fast as possible, use all the threads available. That seems obvious, each thread has 1/nth of the full work to do.

  • If you choose a single thread access use HashMap, it is simply faster. For add method it is even as much as 3x more efficient. Only get is faster on ConcurrentHashMap, but not much.

  • When operating on ConcurrentHashMap with many threads it is similarly effective to operating on separate HashMaps for each thread. So there is no need to partition your data in different structures.

To sum up, the performance for ConcurrentHashMap is worse when you use with single thread, but adding more threads to do the work will definitely speed-up the process.

Testing platform

AMD FX6100, 16GB Ram
Xubuntu 16.04, Oracle JDK 8 update 91, Scala 2.11.8

Atais
  • 10,857
  • 6
  • 71
  • 111
  • 6
    Nice analysis, but Collections.synchronizedCollection() locks the object for each read/write, which is not how ConcurrentHashMap works. So I wouldn't try to infer anything about the performance of ConcurrentHashMap vs HashMap (which is what the question asks) from your stats. Perhaps create another question: "Collections.synchronizedCollection() vs TreeMap" and post your answer there? – Jan Żankowski Aug 25 '15 at 08:30
  • 1
    @Atais what software can draw this graph? – Flory Li Dec 20 '15 at 03:00
  • Can you describe each of 4 data set you have on graph? I am not sure whether you compare apples to apples. If there is copy of HashMap per thread there should be comparison with copy of ConcurrentHashMap per thread (this has a little sense). If you comparing mechanizm to share data across threads then you need single data structure instance and compare different locking mechanizm. Otherwise you comparing performance of software that does not work (HashMap modified by multiple threads) with software that does work(ConcurrentHashMap modified by multiple threads). – JJ Roman Aug 30 '21 at 08:40
  • The benchmark seems flawed, why does the `get` graph show that it is O(n)? It should be O(1), or in the absolute worst case (post Java 8) O(log(n)) – Deanveloper Sep 04 '21 at 15:35
96

Thread safety is a complex question. If you want to make an object thread safe, do it consciously, and document that choice. People who use your class will thank you if it is thread safe when it simplifies their usage, but they will curse you if an object that once was thread safe becomes not so in a future version. Thread safety, while really nice, is not just for Christmas!

So now to your question:

ConcurrentHashMap (at least in Sun's current implementation) works by dividing the underlying map into a number of separate buckets. Getting an element does not require any locking per se, but it does use atomic/volatile operations, which implies a memory barrier (potentially very costly, and interfering with other possible optimisations).

Even if all the overhead of atomic operations can be eliminated by the JIT compiler in a single-threaded case, there is still the overhead of deciding which of the buckets to look in - admittedly this is a relatively quick calculation, but nevertheless, it is impossible to eliminate.

As for deciding which implementation to use, the choice is probably simple.

If this is a static field, you almost certainly want to use ConcurrentHashMap, unless testing shows this is a real performance killer. Your class has different thread safety expectations from the instances of that class.

If this is a local variable, then chances are a HashMap is sufficient - unless you know that references to the object can leak out to another thread. By coding to the Map interface, you allow yourself to change it easily later if you discover a problem.

If this is an instance field, and the class hasn't been designed to be thread safe, then document it as not thread safe, and use a HashMap.

If you know that this instance field is the only reason the class isn't thread safe, and are willing to live with the restrictions that promising thread safety implies, then use ConcurrentHashMap, unless testing shows significant performance implications. In that case, you might consider allowing a user of the class to choose a thread safe version of the object somehow, perhaps by using a different factory method.

In either case, document the class as being thread safe (or conditionally thread safe) so people who use your class know they can use objects across multiple threads, and people who edit your class know that they must maintain thread safety in future.

Bill Michell
  • 8,240
  • 3
  • 28
  • 33
  • 7
    @Stu, over a year later I found this post and found Bill's answer incredibly helpful. Whether or not the OP was grateful enough to accept the answer, I'm still thankful Bill took the time to write it out. @Bill, Thanks! – cottonBallPaws Jan 15 '11 at 00:19
  • @littleFluffyKitty: That is very good to hear. My comment was more directed at @Mauli :P – Stu Thompson Jan 17 '11 at 09:23
  • @BillMichell Thanks! I just discovered this and your answer not just makes the decision perfectly clear, it also teaches an important aspect of software development - never forgetting that others will use and possibly edit your code someday! – anuragw Dec 26 '12 at 14:06
  • 3
    Interestingly, "Uncle Bob's" [Clean Code](https://books.google.com.au/books?id=_i6bDeoCQzsC&lpg=PP1&dq=clean%20code&pg=PT401#v=onepage&q=concurrenthashmap&f=false) book mentions this: `the ConcurrentHashMap implementation performs better than HashMap in nearly all situations`. There are no stats provided to back this up, but I'd be interested to see this verified... – Dylan Watson Jan 30 '15 at 05:32
  • I accepted the other answer, since it was more in line with what I was looking for, although I would like to see some more performance comparisons with actual numbers. – Mauli Oct 14 '16 at 08:38
  • @Bill Michell you mention that atomic operations will slow things down and that might be true, but in addition, having separated buckets might make it harder for the CPU cache to contain all the values in a highly used Map which might make a difference to the performance, especially in any test specifically testing Map performance. – mjaggard Jul 11 '22 at 09:32
4

I would recommend you measure it, since (for one reason) there may be some dependence on the hashing distribution of the particular objects you're storing.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
3

The standard hashmap provides no concurrency protection whereas the concurrent hashmap does. Before it was available, you could wrap the hashmap to get thread safe access but this was coarse grain locking and meant all concurrent access got serialised which could really impact performance.

The concurrent hashmap uses lock stripping and only locks items that affected by a particular lock. If you're running on a modern vm such as hotspot, the vm will try and use lock biasing, coarsaning and ellision if possible so you'll only pay the penalty for the locks when you actually need it.

In summary, if your map is going to be accesaed by concurrent threads and you need to guarantee a consistent view of it's state, use the concurrent hashmap.

Robert Christie
  • 20,177
  • 8
  • 42
  • 37
2

In the case of a 1000 element hash table using 10 locks for whole table saves close to half the time when 10000 threads are inserting and 10000 threads are deleting from it.

The interesting run time difference is here

Always use Concurrent data structure. except when the downside of striping (mentioned below) becomes a frequent operation. In that case you will have to acquire all the locks? I read that the best ways to do this is by recursion.

Lock striping is useful when there is a way of breaking a high contention lock into multiple locks without compromising data integrity. If this is possible or not should take some thought and is not always the case. The data structure is also the contributing factor to the decision. So if we use a large array for implementing a hash table, using a single lock for the entire hash table for synchronizing it will lead to threads sequentially accessing the data structure. If this is the same location on the hash table then it is necessary but, what if they are accessing the two extremes of the table.

The down side of lock striping is it is difficult to get the state of the data structure that is affected by striping. In the example the size of the table, or trying to list/enumerate the whole table may be cumbersome since we need to acquire all of the striped locks.

0

What answer are you expecting here?

It is obviously going to depend on the number of reads happening at the same time as writes and how long a normal map must be "locked" on a write operation in your app (and whether you would make use of the putIfAbsent method on ConcurrentMap). Any benchmark is going to be largely meaningless.

oxbow_lakes
  • 133,303
  • 56
  • 317
  • 449
0

It's not clear what your mean. If you need thread safeness, you have almost no choice - only ConcurrentHashMap. And it's definitely have performance/memory penalties in get() call - access to volatile variables and lock if you're unlucky.

Vitaly
  • 1,081
  • 2
  • 9
  • 14
0

Of course a Map without any lock system wins against one with thread-safe behavior which needs more work. The point of the Concurrent one is to be thread safe without using synchronized so to be faster than HashTable. Same graphics would would be very interesting for ConcurrentHashMap vs Hashtable (which is synchronized).