30

Using clojure I have a very large amount of data in a sequence and I want to process it in parallel, with a relatively small number of cores (4 to 8).

The easiest thing to do is use pmap instead of map, to map my processing function over the sequence of data. But the coordination overhead results in a net loss in my case.

I think the reason is that pmap assumes the function mapped across the data is very costly. Looking at pmap's source code it appears to construct a future for each element of the sequence in turn so each invocation of the function occurs on a separate thread (cycling over the number of available cores).

Here is the relevant piece of pmap's source:

(defn pmap
  "Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead."
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)
         step (fn step [[x & xs :as vs] fs]
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))
                   (map deref vs))))]
     (step rets (drop n rets))))
  ;; multi-collection form of pmap elided

In my case the mapped function is not that expensive but sequence is huge (millions of records). I think the cost of creating and dereferencing that many futures is where the parallel gain is lost in overhead.

Is my understanding of pmap correct?

Is there a better pattern in clojure for this sort of lower cost but massively repeated processing than pmap? I am considering chunking the data sequence somehow and then running threads on larger chunks. Is this a reasonable approach and what clojure idioms would work?

Alex Stoddard
  • 8,244
  • 4
  • 41
  • 61
  • don't forget to take advantage of memoization if applicable. http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/memoize – Brian Gianforcaro Jan 22 '10 at 02:17

4 Answers4

20

This question: how-to-efficiently-apply-a-medium-weight-function-in-parallel also addresses this problem in a very similar context.

The current best answer is to use partition to break it into chunks. then pmap a map function onto each chunk. then recombine the results. map-reduce-style.

Community
  • 1
  • 1
Arthur Ulfeldt
  • 90,827
  • 27
  • 201
  • 284
  • Would I really want to use `pmap` on each chunk? I think that will still create a future per seq item. It would make more sense to me to just `map` a `future` onto each chuck. – Alex Stoddard Jan 20 '10 at 21:26
  • 3
    the idea is to increase the chunk size so that it beats the coordination overhead while still filling all the cores. Not all data sets have a sweet spot like this. – Arthur Ulfeldt Jan 21 '10 at 00:35
  • 1
    Ah-ha. I needed to think at a level of one additional abstraction. I `pmap` a function over the chunks and that function will `map` my processing function over each member of the chunk. Is that what you mean? – Alex Stoddard Jan 21 '10 at 13:42
  • That is the map-reduce style of processing. Split your input to chunks, fire jobs in parallel on each chunk and then join the results. http://en.wikipedia.org/wiki/MapReduce . The only question is the size and the number of chunks. – edbond Jan 21 '10 at 14:53
  • @Alex Stoddard, thats it. Im fighting back the urge to say that all problems can be solved by one more level of abstraction. – Arthur Ulfeldt Jan 21 '10 at 19:39
  • 13
    One has to be careful not to (silently!) skip some of the inputs with `partition` due to the fact that it never produces chunks smaller then specified. E.g. `(partition 5 [1 2])` evaluates to en empty lazy seq! `clojure.contrib.seq-utils/partition-all` (soon to be `clojure.contrib.seq/partition-all`) puts together a short final chunk instead (`((1 2))` with arguments as above). – Michał Marczyk Feb 01 '10 at 22:08
  • 1
    (partition 5 5 '() [1 2]) will leave the small-sized chunk on the end and not drop anything. – Arthur Ulfeldt Sep 14 '10 at 21:01
  • 3
    Is the Clojure reducers library a better solution now? – Daniel Compton Jan 24 '14 at 23:55
5

Sadly not a valid answer yet, but something to watch for in the future is Rich's work with the fork/join library coming in Java 7. If you look at his Par branch on github he's done some work with it, and last I had seen the early returns were amazing.

Example of Rich trying it out.

http://paste.lisp.org/display/84027

Runevault
  • 1,492
  • 9
  • 16
  • 2
    Actually I have discovered this can be tried now with Java6, the Clojure "par" branch from github and the jsr166y.jar file that Rich Hickey made available at: http://cloud.github.com/downloads/richhickey/clojure/jsr166y.jar – Alex Stoddard Jan 21 '10 at 19:57
  • Ohhh really? May have to give that a look, as Par looks amazing. Thanks for the tip, as I missed this. – Runevault Jan 22 '10 at 11:59
  • Is this what eventually became the reducers library? – Daniel Compton Jan 24 '14 at 18:44
2

The fork/join work mentioned in earlier answers on this and similar threads eventually bore fruit as the reducers library, which is probably worth a look.

Joffer
  • 1,921
  • 2
  • 21
  • 23
0

You can use some sort of map/reduce implemented by hand. Also take a look at swarmiji framework.

"A distributed computing system that helps writing and running Clojure code in parallel - across cores and processors"

edbond
  • 3,921
  • 19
  • 26
  • 1
    swarmiji if a library for distributed computing in Clojure. I got the impression this quiestion was focusing more on single-system-parallel-exacution. – Arthur Ulfeldt Jan 20 '10 at 20:12