Parallel doseq for Clojure

Question

I haven't used multithreading in Clojure at all so am unsure where to start.

I have a doseq whose body can run in parallel. What I'd like is for there always to be 3 threads running (leaving 1 core free) that evaluate the body in parallel until the range is exhausted. There's no shared state, nothing complicated - the equivalent of Python's multiprocessing would be just fine.

So something like:

(dopar 3 [i (range 100)]
  ; repeated 100 times in 3 parallel threads...
  ...)

Where should I start looking? Is there a command for this? A standard package? A good reference?

So far I have found pmap, and could use that (how do I restrict to 3 at a time? looks like it uses 32 at a time - no, source says 2 + number of processors), but it seems like this is a basic primitive that should already exist somewhere.

clarification: I really would like to control the number of threads. I have processes that are long-running and use a fair amount of memory, so creating a large number and hoping things work out OK isn't a good approach (example which uses a significant chunk available mem).

update: Starting to write a macro that does this, and I need a semaphore (or a mutex, or an atom i can wait on). Do semaphores exist in Clojure? Or should I use a ThreadPoolExecutor? It seems odd to have to pull so much in from Java - I thought parallel programming in Clojure was supposed to be easy... Maybe I am thinking about this completely the wrong way? Hmmm. Agents?

andrew cooke · Answer 1 · 2012-06-10T18:46:38.733

7

OK, I think what I want is to have an agent for each loop, with the data sent to the agent using send. The agents triggered using send are run from a thread pool, so the number is limited in some way (it doesn't give the fine-grained control of exactly three threads, but it'll have to do for now).

[Dave Ray explains in comments: to control pool size I'd need to write my own]

(defmacro dopar [seq-expr & body]
  (assert (= 2 (count seq-expr)) "single pair of forms in sequence expression")
  (let [[k v] seq-expr]
    `(apply await
       (for [k# ~v]
         (let [a# (agent k#)]
           (send a# (fn [~k] ~@body))
         a#)))))

which can be used like:

(deftest test-dump
  (dopar [n (range 7 11)]
    (time (do-dump-single "/tmp/single" "a" n 10000000))))

Yay! Works! I rock! (OK, Clojure rocks a little bit too). Related blog post.

edited Jun 10 '12 at 18:46

answered Jun 10 '12 at 17:18

andrew cooke

45,717
10
93
143

If you want to control the thread pool you'll need to construct your own. Clojure makes concurrency simpler by providing tools for reducing mutation (default immutability, stm, etc), but out-of-the-box, it expects you to defer to java.util.concurrent if you need fine-grained threading control beyond what agents and futures provide. – Dave Ray Jun 10 '12 at 18:22
yeah, thanks. i found a blog post from someone showing how to do that (would post it, but lost it again.). – andrew cooke Jun 10 '12 at 18:30

divs1210 · Answer 2 · 2017-08-30T15:39:41.540

6

There's actually a library now for doing exactly this. From their github:

The claypoole library provides threadpool-based parallel versions of Clojure functions such as pmap, future, and for.

It provides both ordered/unordered versions for the same.

edited Aug 30 '17 at 15:39

answered Aug 30 '17 at 15:34

divs1210

690
1
8
16

How sad, after all the concurrency hyping of clojure... can't do a lot without either falling back to Java or that extra library – matanster Mar 23 '18 at 22:22
Took me a while to figure out that the parallel version of `doseq` in the claypoole library is called `pdoseq`. – wedesoft Mar 16 '21 at 21:33

mikera · Accepted Answer · 2012-06-11T03:33:26.203

pmap will actually work fine in most circumstances - it uses a thread pool with a sensible number of threads for your machine. I wouldn't bother trying to create your own mechanisms to control the number of threads unless you have real benchmark evidence that the defaults are causing a problem.

Having said that, if you really want to limit to a maximum of three threads, an easy approach is to just use pmap on 3 subsets of the range:

(defn split-equally [num coll] 
  "Split a collection into a vector of (as close as possible) equally sized parts"
  (loop [num num 
         parts []
         coll coll
         c (count coll)]
    (if (<= num 0)
      parts
      (let [t (quot (+ c num -1) num)]
        (recur (dec num) (conj parts (take t coll)) (drop t coll) (- c t)))))) 

(defmacro dopar [thread-count [sym coll] & body]
 `(doall (pmap 
    (fn [vals#]
      (doseq [~sym vals#]
        ~@body))  
    (split-equally ~thread-count ~coll))))

Note the use of doall, which is needed to force evaluation of the pmap (which is lazy).

this is good, thanks, but i think it assumes all tasks last the same time (which they don't). — andrew cooke, Jun 11 '12 at 04:21
...or that there are so many that the law of averages smooths things out. i'm not trying to be difficult, but this would not be optimal either. — andrew cooke, Jun 11 '12 at 04:27
`dorun` is a good choice over `doall` if you don't want to hold a reference to the head (less memory usage) — jocull, Dec 07 '14 at 16:30

score 4 · Answer 4 · answered Jun 10 '12 at 19:05

4

Why don't you just use pmap? You still can't control the threadpool, but it's a lot less work than writing a custom macro that uses agents (why not futures?).

answered Jun 10 '12 at 19:05

amalloy

89,153
8
140
205

i was hoping that the thread pool for send was smaller than 32 (but i don't know if it is). will look at futures, thanks. – andrew cooke Jun 10 '12 at 19:06
is it possible to block on a group of futures until one is available? if so, i think that would give the fine-grained control i would like... – andrew cooke Jun 10 '12 at 19:09
2

Send's threadpool is roughly the number of processors you have. Future uses an unlimited threadpool, but pmap avoids spinning up too many at a time. – amalloy Jun 10 '12 at 19:45
afaict (see answer linked to in question above) pmap uses 32 (it chunks). – andrew cooke Jun 10 '12 at 23:57
Only if your input sequence is chunked. – amalloy Jun 11 '12 at 01:29
ok, i checked the source and it looks like 2 + no of processors for pmap. but it looks more like chunks than a thread pool (i suspect it waits until the slowest of 6 - in my case - terminates). – andrew cooke Jun 11 '12 at 04:23
`pmap` will keep a reference to the head of the collection won't it? this can be a problem with a huge collection – matanster Mar 23 '18 at 22:20
@matanster No, not the whole collection. Just the items in it that haven't yet been mapped over. – amalloy Mar 23 '18 at 22:28
@amalloy thanks, couldn't find any documentation on it being different than `map` in that regard. Does that mean that on the get go it realizes the entire collection, then discards from memory what's been processed?! – matanster Mar 23 '18 at 23:03
No, it's the same as map. Neither of them holds the head of any collections. – amalloy Mar 24 '18 at 06:10

score 4 · Answer 5 · answered Oct 30 '13 at 16:19

I had a similar problem with the following requirements:

Have control over the number of threads used;
Be agnostic about the management of the thread pool;
Order of the tasks need not to be kept;
Processing time of the tasks can be different, therefore the ordering of tasks must not be kept, but the task which finishes earlier should be returned earlier;
Evaluate and submit the input sequence lazily;
Elements in the input sequence should not be read out of bounds, but should be buffered and read in line with the returned results, to avoid out-of-memory issues.

The core pmap function only satisfies the last two assumptions.

Here is an implementation which does satisfy those assumptions, using a standard Java thread pool ExecutorService together with a CompletionService and some partitioning of the input stream:

(require '[clojure.tools.logging :as log])

(import [java.util.concurrent ExecutorService ExecutorCompletionService 
                              CompletionService Future])

(defn take-seq
  [^CompletionService pool]
  (lazy-seq
   (let [^Future result (.take pool)]
     (cons (.get result)
           (take-seq pool)))))

(defn qmap
  [^ExecutorService pool chunk-size f coll]
  (let [worker (ExecutorCompletionService. pool)]
    (mapcat
     (fn [chunk]
       (let [actual-size (atom 0)]
         (log/debug "Submitting payload for processing")
         (doseq [item chunk]
           (.submit worker #(f item))
           (swap! actual-size inc))
         (log/debug "Outputting completed results for" @actual-size "trades")
         (take @actual-size (take-seq worker))))
     (partition-all chunk-size coll))))

As it can be seen qmap does not instantiate the thread pool itself, but only the ExecutorCompletionService. This allows, for example, to pass in a fixed size ThreadPoolExecutorService. Also, since qmap returns a lazy sequence, it can not and must not manage the thread pool resource itself. Finally the chunk-size allows limiting how many elements of the input sequence are realized and submitted as tasks at once.

The below code demonstrates the proper usage:

(import [java.util.concurrent Executors])

(let [thread-pool (Executors/newFixedThreadPool 3)]
  (try
    (doseq [result (qmap thread-pool
                         ;; submit no more than 500 tasks at once
                         500 
                         long-running-resource-intensive-fn
                         unboundedly-large-lazy-input-coll)]
      (println result))
    (finally
      ;; (.shutdown) only prohibits submitting new tasks,
      ;; (.shutdownNow) will even cancel already submitted tasks.
      (.shutdownNow thread-pool))))

Here are the documentation for the some of the used Java concurrency classes:

thanks; i haven't looked at this in detail, but you seem to have understood the problem. — andrew cooke, Oct 30 '13 at 16:22

score 2 · Answer 6 · answered Jun 19 '12 at 03:43

2

Not sure if it is idiomatic, as I'm still quite a beginner with Clojure, but the following solution works for me and it also looks pretty concise:

(let [number-of-threads 3
      await-timeout 1000]
  (doseq [p-items (partition number-of-threads items)]
    (let [agents (map agent p-items)]
      (doseq [a agents] (send-off a process))
      (apply await-for await-timeout agents)
      (map deref agents))))

answered Jun 19 '12 at 03:43

Marco Lazzeri

1,808
19
15

thanks, but doesn't it have similar issues to mikera's answer? in that things are not running fluidly in parallel, but instead in batches (which is less efficient). and, i think, it's less efficient that mikeras, since it has more batches. in other words, it has a fixed mapping from processes to cpus and cannot adapt efficiently to the different running times of different processes. – andrew cooke Jun 19 '12 at 12:19
Absolutely: it has a fixed mapping (3 threads in the above example). I thought that was what you wanted when you asked for "there always to be 3 threads running". If you want the system to optimize it for you, instead, I would then certainly use "pmap". – Marco Lazzeri Jun 19 '12 at 17:22
when i said i always want 3 thread running i meant that i always wanted 3 threads running. maybe i am not understanding, but it seems to me that there will be times in your solution when only 1 or 2 threads are running). if i use pmap then there are 6 threads running. it's very kind of you to point me to other answers but they don't seem to actually have three threads running, which is what i am asking for... – andrew cooke Jun 19 '12 at 18:05

score 0 · Answer 7 · answered May 28 '18 at 11:13

0

Use pipelines and channels. If your operations are IO bound that is a preferable option as pmap's pool is bound to CPUs amount.

Another good option is to use an agent along with send-off which uses cachedThredPoolExecutor underneath.

answered May 28 '18 at 11:13

Den Roman

548
7
10

Parallel doseq for Clojure

7 Answers7

Linked