I'm fairly new to Clojure and I have some code that I am trying to optimise. I want to compute concurrance-counts. The main function is compute-space and the output is a nested map of the type
{"w1" {"w11" 10, "w12" 31, ...}
"w2" {"w21" 14, "w22" 1, ...}
...
}
meaning that "w1" cooccurs with "w11" 10 times, etc...
It takes a coll of documents (sentences) and a coll of target words, it iterates over both and finally applies the context-fn such as sliding-window to extract context-words. More concretely I am passing a closure over sliding-window
(compute-space docs (fn [target doc] (sliding-window target doc 5)) targets)
I've been testing it with around 50 million words (~ 3 million sentences) and ca. 20,000 targets. This version would take more than a day to complete. I also wrote a pmap parallel function (pcompute-space) that would reduce computing time to around 10 hours, but I still I feel it should be faster. I don't have other code to compare, but my intuition says it should be faster.
(defn compute-space
([docs context-fn targets]
(let [space (atom {})]
(doseq [doc docs
target targets]
(when-let [contexts (context-fn target doc)]
(doseq [w contexts]
(if (get-in @space [target w])
(swap! space update-in [target w] (partial inc))
(swap! space assoc-in [target w] 1)))))
@space)))
(defn sliding-window
[target s n]
(loop [todo s seen [] acc []]
(let [curr (first todo)]
(cond (= curr target) (recur (rest todo) (cons curr seen) (concat acc (take n seen) (take n (rest todo))))
(empty? todo) acc
:else (recur (rest todo) (cons curr seen) acc)))))
(defn pcompute-space
[docs step context-fn targets]
(reduce
#(deep-merge-with + %1 %2)
(pmap
(fn [chunk]
(do (tick))
(compute-space chunk context-fn targets))
(partition-all step docs)))
I profiled the application with jvisualvm and I found out that clojure.lang.Cons, clojure.lang.ChunkedCons and clojure.lang.ArrayChunk are dominating the process quite excessively (see picture). This surely has to do with the fact that I am using this double doseq loop, (previous experiments showed that this way was faster than using map, reduce and the like, although I was using time for benchmarking the functions). I'd very thankful for any insights you could provide me, and suggestions for refactor the code and make it run faster. I guess reducers could be of some help here, but I'm not sure as to how and/or why.
SPECS
MacPro 2010 2,4 GHz Intel Core 2 Duo 4 GB RAM
Clojure 1.6.0
Java 1.7.0_51 Java HotSpot(TM) 64-Bit Server VM