5

Are there any published micro-benchmarks that compare the Scala mutable and immutable collections with each other and the collections in java.util.concurrent, in multi-threaded environments? I am particularly interested in cases where readers far outnumber writers, like caching HashMaps in server-side code.

Micro-benchmarks of the Clojure collections would also be acceptable, as their algorithms are similar to those used in the Scala 2.8 persistent collections.

I'll write my own if there are none already done, but writing good micro-benchmarks is not trivial.

Ralph
  • 31,584
  • 38
  • 145
  • 282
  • I think it is exceedingly unlikely you'll get any reasonable benchmark that compares mutable and immutable collections, because the design of the application itself is different. – Daniel C. Sobral Sep 27 '11 at 17:07
  • @Daniel: We currently have some Java server code that contains HashMaps that are read about 1,000,000 times for every write. The code uses `synchronized`, but the readers pay a penalty for all of those contended reads even though the data is effective immutable. I thought that I might be able to use the persistent collections from functionaljava and lock only when replacing the old collection with the new "copied" collection containing the new item. – Ralph Sep 27 '11 at 18:40
  • Looks like a reasonable expectation, and it illustrates the problem with benchmarks. If you test that kind of load, you are biasing for immutability. But note that, using immutable maps, you have to _replace_ the map whenever you update, meaning you'll need to serialize all updates somehow. The map itself might be pointed by a volatile, if you don't mind reads lagging behinds writes. – Daniel C. Sobral Sep 27 '11 at 23:12

3 Answers3

2

There are some results comparing Java hash maps, Scala hash maps, Java concurrent hash maps, Java concurrent skip lists, Java parallel arrays and Scala parallel collections here (at the end of the technical report):

http://infoscience.epfl.ch/record/165523/files/techrep.pdf

There is a more detailed comparison of concurrent skip lists and Java concurrent hash maps here (also at the end of the main part of the report, before the appendix):

http://infoscience.epfl.ch/record/166908/files/ctries-techreport.pdf

These micro benchmarks are focused on testing the performance of one single operation. If you plan to write your own benchmarks, this will probably be useful:

http://buytaert.net/files/oopsla07-georges.pdf

axel22
  • 32,045
  • 9
  • 125
  • 137
  • You can find the source code for some of the benchmarks in the 1st paper here: http://lampsvn.epfl.ch/svn-repos/scala/scala/trunk/test/benchmarks/, but they're somewhat unstructured. For the 2nd: https://github.com/axel22/Ctries, take a look in the src/bench. – axel22 Sep 28 '11 at 15:28
  • Here are the detailed benchmarks on Scala collection operations: https://github.com/scalameter/scalameter/tree/master/src/test/scala/org/scalameter/collections – axel22 Dec 29 '14 at 21:19
1

Li Haoyi's Benchmarking Scala Collections is a detailed and comprehensive study that addresses your query. It is way too long to quote here.

Mike Slinn
  • 7,705
  • 5
  • 51
  • 85
0

Why don't you try using java.util.concurrent.ConcurrentHashMap then? that way you don't have to synchronize, and your million reads will be much faster (as well as the one write).

Chochos
  • 5,155
  • 22
  • 27
  • 1
    I believe that is a fallacy. If you read a (mutable) value from the HashMap, update that value, then try to put is back, you must still synchronize the operation. I'm not sure if you can do it with `replace`. – Ralph Sep 27 '11 at 20:12
  • Another option would be to use Clojure's STM (you can use it from Java). Although I'm not sure how you'd use it with a HashMap... – Chochos Sep 27 '11 at 21:11