(Answer credit: Renzo Borgatti (@reborg).)
First, let's set up some sample data, which we'll use for performance tests later. This vector contains 500k maps with the same key. Values are overlapping 1/5th of the time.
(def data
(mapv hash-map
(repeat :samplevalue)
(concat (range 1e5)
(range 1e5)
(range 1e5)
(range 1e5)
(range 1e5))))
Now let's do your transformation with transducers. Note that this solution is not parallel. I shortened your .intValue
to just int
, which does the same thing. Also, conditionally fetching :samplevalue
from each map can be shortened to just (keep :samplevalue sequence)
, which is equivalent to (remove nil? (map :samplevalue sequence))
. We'll use Criterium to benchmark.
(require '[criterium.core :refer [quick-bench]])
(quick-bench
(transduce
(comp
(keep :samplevalue)
(map int))
(completing #(assoc! %1 %2 (inc (get %1 %2 0))) persistent!)
(transient {})
data))
;; My execution time mean: 405 ms
Note that we're not calling frequencies
as an external step this time. Instead, we've woven it into the operation. And just like what frequencies
does, we've done the operations on a transient hashmap for extra performance. We do this by using a transient hashmap as the seed and completing
the final value by calling persistent!
on it.
We can make this parallel. For maximum performance, we use a mutable Java ConcurrentHashMap
instead of an immutable Clojure data structure.
(require '[clojure.core.reducers :as r])
(import '[java.util HashMap Collections Map]
'java.util.concurrent.atomic.AtomicInteger
'java.util.concurrent.ConcurrentHashMap)
(quick-bench
(let [concurrency-level (.availableProcessors (Runtime/getRuntime))
m (ConcurrentHashMap. (quot (count data) 2) 0.75 concurrency-level)
combinef (fn ([] m) ([_ _])) ; just return `m` from the combine step
rf (fn [^Map m k]
(let [^AtomicInteger v (or (.get m k) (.putIfAbsent m k (AtomicInteger. 1)))]
(when v (.incrementAndGet v))
m))
reducef ((comp (keep :samplevalue) (map int)) rf)]
(r/fold combinef reducef data)
(into {} m)))
;; My execution time mean: 70 ms
Here we use fold
from the clojure.core.reducers
library to achieve parallelism. Note that in a parallel context any transducers one uses need to be stateless. Also note that a ConcurrentHashMap
doesn't support using nil
as a key or value; fortunately, we don't need to do that here.
The output is converted into an immutable Clojure hashmap at the end. You can remove that step and just use the ConcurrentHashMap instance for an additional speedup—on my machine, removing the into
step makes the whole fold
take about 26ms.
Edit 2017-11-20: User @clojuremostly correctly pointed out that an earlier version of this answer had the call to quick-bench
inside the let
block that initialized the concurrent hash map instance, which meant that the benchmark used the same instance for all of its runs. I moved the call to quick-bench
to be outside the let
block. It did not significantly affect the results.