18

I haven't used multithreading in Clojure at all so am unsure where to start.

I have a doseq whose body can run in parallel. What I'd like is for there always to be 3 threads running (leaving 1 core free) that evaluate the body in parallel until the range is exhausted. There's no shared state, nothing complicated - the equivalent of Python's multiprocessing would be just fine.

So something like:

(dopar 3 [i (range 100)]
  ; repeated 100 times in 3 parallel threads...
  ...)

Where should I start looking? Is there a command for this? A standard package? A good reference?

So far I have found pmap, and could use that (how do I restrict to 3 at a time? looks like it uses 32 at a time - no, source says 2 + number of processors), but it seems like this is a basic primitive that should already exist somewhere.

clarification: I really would like to control the number of threads. I have processes that are long-running and use a fair amount of memory, so creating a large number and hoping things work out OK isn't a good approach (example which uses a significant chunk available mem).

update: Starting to write a macro that does this, and I need a semaphore (or a mutex, or an atom i can wait on). Do semaphores exist in Clojure? Or should I use a ThreadPoolExecutor? It seems odd to have to pull so much in from Java - I thought parallel programming in Clojure was supposed to be easy... Maybe I am thinking about this completely the wrong way? Hmmm. Agents?

Community
  • 1
  • 1
andrew cooke
  • 45,717
  • 10
  • 93
  • 143

7 Answers7

7

OK, I think what I want is to have an agent for each loop, with the data sent to the agent using send. The agents triggered using send are run from a thread pool, so the number is limited in some way (it doesn't give the fine-grained control of exactly three threads, but it'll have to do for now).

[Dave Ray explains in comments: to control pool size I'd need to write my own]

(defmacro dopar [seq-expr & body]
  (assert (= 2 (count seq-expr)) "single pair of forms in sequence expression")
  (let [[k v] seq-expr]
    `(apply await
       (for [k# ~v]
         (let [a# (agent k#)]
           (send a# (fn [~k] ~@body))
         a#)))))

which can be used like:

(deftest test-dump
  (dopar [n (range 7 11)]
    (time (do-dump-single "/tmp/single" "a" n 10000000))))

Yay! Works! I rock! (OK, Clojure rocks a little bit too). Related blog post.

andrew cooke
  • 45,717
  • 10
  • 93
  • 143
  • If you want to control the thread pool you'll need to construct your own. Clojure makes concurrency simpler by providing tools for reducing mutation (default immutability, stm, etc), but out-of-the-box, it expects you to defer to java.util.concurrent if you need fine-grained threading control beyond what agents and futures provide. – Dave Ray Jun 10 '12 at 18:22
  • yeah, thanks. i found a blog post from someone showing how to do that (would post it, but lost it again.). – andrew cooke Jun 10 '12 at 18:30
6

There's actually a library now for doing exactly this. From their github:

The claypoole library provides threadpool-based parallel versions of Clojure functions such as pmap, future, and for.

It provides both ordered/unordered versions for the same.

divs1210
  • 690
  • 1
  • 8
  • 16
  • How sad, after all the concurrency hyping of clojure... can't do a lot without either falling back to Java or that extra library – matanster Mar 23 '18 at 22:22
  • Took me a while to figure out that the parallel version of `doseq` in the claypoole library is called `pdoseq`. – wedesoft Mar 16 '21 at 21:33
5

pmap will actually work fine in most circumstances - it uses a thread pool with a sensible number of threads for your machine. I wouldn't bother trying to create your own mechanisms to control the number of threads unless you have real benchmark evidence that the defaults are causing a problem.

Having said that, if you really want to limit to a maximum of three threads, an easy approach is to just use pmap on 3 subsets of the range:

(defn split-equally [num coll] 
  "Split a collection into a vector of (as close as possible) equally sized parts"
  (loop [num num 
         parts []
         coll coll
         c (count coll)]
    (if (<= num 0)
      parts
      (let [t (quot (+ c num -1) num)]
        (recur (dec num) (conj parts (take t coll)) (drop t coll) (- c t)))))) 

(defmacro dopar [thread-count [sym coll] & body]
 `(doall (pmap 
    (fn [vals#]
      (doseq [~sym vals#]
        ~@body))  
    (split-equally ~thread-count ~coll))))

Note the use of doall, which is needed to force evaluation of the pmap (which is lazy).

mikera
  • 105,238
  • 25
  • 256
  • 415
  • this is good, thanks, but i think it assumes all tasks last the same time (which they don't). – andrew cooke Jun 11 '12 at 04:21
  • ...or that there are so many that the law of averages smooths things out. i'm not trying to be difficult, but this would not be optimal either. – andrew cooke Jun 11 '12 at 04:27
  • 2
    `dorun` is a good choice over `doall` if you don't want to hold a reference to the head (less memory usage) – jocull Dec 07 '14 at 16:30
4

Why don't you just use pmap? You still can't control the threadpool, but it's a lot less work than writing a custom macro that uses agents (why not futures?).

amalloy
  • 89,153
  • 8
  • 140
  • 205
  • i was hoping that the thread pool for send was smaller than 32 (but i don't know if it is). will look at futures, thanks. – andrew cooke Jun 10 '12 at 19:06
  • is it possible to block on a group of futures until one is available? if so, i think that would give the fine-grained control i would like... – andrew cooke Jun 10 '12 at 19:09
  • 2
    Send's threadpool is roughly the number of processors you have. Future uses an unlimited threadpool, but pmap avoids spinning up too many at a time. – amalloy Jun 10 '12 at 19:45
  • afaict (see answer linked to in question above) pmap uses 32 (it chunks). – andrew cooke Jun 10 '12 at 23:57
  • Only if your input sequence is chunked. – amalloy Jun 11 '12 at 01:29
  • ok, i checked the source and it looks like 2 + no of processors for pmap. but it looks more like chunks than a thread pool (i suspect it waits until the slowest of 6 - in my case - terminates). – andrew cooke Jun 11 '12 at 04:23
  • `pmap` will keep a reference to the head of the collection won't it? this can be a problem with a huge collection – matanster Mar 23 '18 at 22:20
  • @matanster No, not the whole collection. Just the items in it that haven't yet been mapped over. – amalloy Mar 23 '18 at 22:28
  • @amalloy thanks, couldn't find any documentation on it being different than `map` in that regard. Does that mean that on the get go it realizes the entire collection, then discards from memory what's been processed?! – matanster Mar 23 '18 at 23:03
  • No, it's the same as map. Neither of them holds the head of any collections. – amalloy Mar 24 '18 at 06:10
4

I had a similar problem with the following requirements:

  1. Have control over the number of threads used;
  2. Be agnostic about the management of the thread pool;
  3. Order of the tasks need not to be kept;
  4. Processing time of the tasks can be different, therefore the ordering of tasks must not be kept, but the task which finishes earlier should be returned earlier;
  5. Evaluate and submit the input sequence lazily;
  6. Elements in the input sequence should not be read out of bounds, but should be buffered and read in line with the returned results, to avoid out-of-memory issues.

The core pmap function only satisfies the last two assumptions.

Here is an implementation which does satisfy those assumptions, using a standard Java thread pool ExecutorService together with a CompletionService and some partitioning of the input stream:

(require '[clojure.tools.logging :as log])

(import [java.util.concurrent ExecutorService ExecutorCompletionService 
                              CompletionService Future])

(defn take-seq
  [^CompletionService pool]
  (lazy-seq
   (let [^Future result (.take pool)]
     (cons (.get result)
           (take-seq pool)))))

(defn qmap
  [^ExecutorService pool chunk-size f coll]
  (let [worker (ExecutorCompletionService. pool)]
    (mapcat
     (fn [chunk]
       (let [actual-size (atom 0)]
         (log/debug "Submitting payload for processing")
         (doseq [item chunk]
           (.submit worker #(f item))
           (swap! actual-size inc))
         (log/debug "Outputting completed results for" @actual-size "trades")
         (take @actual-size (take-seq worker))))
     (partition-all chunk-size coll))))

As it can be seen qmap does not instantiate the thread pool itself, but only the ExecutorCompletionService. This allows, for example, to pass in a fixed size ThreadPoolExecutorService. Also, since qmap returns a lazy sequence, it can not and must not manage the thread pool resource itself. Finally the chunk-size allows limiting how many elements of the input sequence are realized and submitted as tasks at once.

The below code demonstrates the proper usage:

(import [java.util.concurrent Executors])

(let [thread-pool (Executors/newFixedThreadPool 3)]
  (try
    (doseq [result (qmap thread-pool
                         ;; submit no more than 500 tasks at once
                         500 
                         long-running-resource-intensive-fn
                         unboundedly-large-lazy-input-coll)]
      (println result))
    (finally
      ;; (.shutdown) only prohibits submitting new tasks,
      ;; (.shutdownNow) will even cancel already submitted tasks.
      (.shutdownNow thread-pool))))

Here are the documentation for the some of the used Java concurrency classes:

Daniel Dinnyes
  • 4,898
  • 3
  • 32
  • 44
2

Not sure if it is idiomatic, as I'm still quite a beginner with Clojure, but the following solution works for me and it also looks pretty concise:

(let [number-of-threads 3
      await-timeout 1000]
  (doseq [p-items (partition number-of-threads items)]
    (let [agents (map agent p-items)]
      (doseq [a agents] (send-off a process))
      (apply await-for await-timeout agents)
      (map deref agents))))
Marco Lazzeri
  • 1,808
  • 19
  • 15
  • thanks, but doesn't it have similar issues to mikera's answer? in that things are not running fluidly in parallel, but instead in batches (which is less efficient). and, i think, it's less efficient that mikeras, since it has more batches. in other words, it has a fixed mapping from processes to cpus and cannot adapt efficiently to the different running times of different processes. – andrew cooke Jun 19 '12 at 12:19
  • Absolutely: it has a fixed mapping (3 threads in the above example). I thought that was what you wanted when you asked for "there always to be 3 threads running". If you want the system to optimize it for you, instead, I would then certainly use "pmap". – Marco Lazzeri Jun 19 '12 at 17:22
  • when i said i always want 3 thread running i meant that i always wanted 3 threads running. maybe i am not understanding, but it seems to me that there will be times in your solution when only 1 or 2 threads are running). if i use pmap then there are 6 threads running. it's very kind of you to point me to other answers but they don't seem to actually have three threads running, which is what i am asking for... – andrew cooke Jun 19 '12 at 18:05
0

Use pipelines and channels. If your operations are IO bound that is a preferable option as pmap's pool is bound to CPUs amount.

Another good option is to use an agent along with send-off which uses cachedThredPoolExecutor underneath.

Den Roman
  • 548
  • 7
  • 10