2

As far as I know, pmap in Clojure works just like map, but it calculates results in parallel, using futures under the hood. So it should "just work" with a function and a sequence, if map works with them. (Unless there are evil side effects that prevent it, but in case of my program there is nothing more than loading data from http server and transforming it)
And in my case pmap doesn't work as expected. Why can this happen?

The problem arises here (if I change map to pmap): https://github.com/magicgoose/DvachMaster/blob/master/src/dvach/core.clj#L82

(defn thread-list
  "load threads from all pages, trying each page at most `max-trials` times with `retry-inteval`"
  [board]
    (try
      (let [p0 (load-body (board-addr board 0))
            numpages (count (:pages p0))
            other-pages (map                    ; problem here
                          (comp
                            load-body
                            (partial board-addr board))
                          (range 1 numpages))
            all-pages (cons p0 other-pages)
            ]

        (doall
          ((comp (partial reduce concat) (partial map :threads)) all-pages)))
      (catch Throwable e
        (.printStackTrace e))))

The exception I get:

java.util.concurrent.ExecutionException: java.lang.ClassCastException: java.lang.Long cannot be cast to java.util.concurrent.Future
    at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
    at java.util.concurrent.FutureTask.get(Unknown Source)
    at clojure.core$deref_future.invoke(core.clj:2108)
    at clojure.core$future_call$reify__6267.deref(core.clj:6308)
    at clojure.core$deref.invoke(core.clj:2128)
    at clojure.core$pmap$step__6280$fn__6282.invoke(core.clj:6358)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$map$fn__4207.invoke(core.clj:2479)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core.protocols$fn__6034.invoke(protocols.clj:146)
    at clojure.core.protocols$fn__6005$G__6000__6014.invoke(protocols.clj:19)
    at clojure.core.protocols$seq_reduce.invoke(protocols.clj:27)
    at clojure.core.protocols$fn__6026.invoke(protocols.clj:53)
    at clojure.core.protocols$fn__5979$G__5974__5992.invoke(protocols.clj:13)
    at clojure.core$reduce.invoke(core.clj:6175)
    at clojure.lang.AFn.applyToHelper(AFn.java:163)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:619)
    at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
    at clojure.lang.RestFn.invoke(RestFn.java:408)
    at clojure.core$comp$fn__4154.invoke(core.clj:2331)
    at dvach.core$thread_list.invoke(core.clj:91)
    at dvach.core$eval3813.invoke(NO_SOURCE_FILE:2)
    at clojure.lang.Compiler.eval(Compiler.java:6619)
    at clojure.lang.Compiler.eval(Compiler.java:6582)
    at clojure.core$eval.invoke(core.clj:2852)
    at clojure.main$repl$read_eval_print__6588$fn__6591.invoke(main.clj:259)
    at clojure.main$repl$read_eval_print__6588.invoke(main.clj:259)
    at clojure.main$repl$fn__6597.invoke(main.clj:277)
    at clojure.main$repl.doInvoke(main.clj:277)
    at clojure.lang.RestFn.invoke(RestFn.java:1096)
    at clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__1023.invoke(interruptible_eval.clj:56)
    at clojure.lang.AFn.applyToHelper(AFn.java:159)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:617)
    at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1788)
    at clojure.lang.RestFn.invoke(RestFn.java:425)
    at clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke(interruptible_eval.clj:41)
    at clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__1064$fn__1067.invoke(interruptible_eval.clj:171)
    at clojure.core$comp$fn__4154.invoke(core.clj:2330)
    at clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__1057.invoke(interruptible_eval.clj:138)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.util.concurrent.Future
    at clojure.core$deref_future.invoke(core.clj:2108)
    at clojure.core$deref.invoke(core.clj:2129)
    at dvach.core$load_body.invoke(core.clj:74)
    at clojure.core$comp$fn__4154.invoke(core.clj:2331)
    at clojure.core$pmap$fn__6275$fn__6276.invoke(core.clj:6354)
    at clojure.core$binding_conveyor_fn$fn__4107.invoke(core.clj:1836)
    at clojure.lang.AFn.call(AFn.java:18)
    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    ... 3 more
Display Name
  • 8,022
  • 3
  • 31
  • 66
  • Yes, "this happens reliably when you're using pmap and never with map" - this was true. (But this must not be true every time) – Display Name May 18 '13 at 17:37

1 Answers1

2

The problem the stack trace complains about is with @max-trials on line 74; this should read max-trials instead. (max-trials is a loop variable initialized to @retry-count on line 66; it'll be a number then, to be decremented on each iteration.)

It may well arise intermittently, since that point in the code is only reached if the try block starting on line 68 fails to fetch the result.

Michał Marczyk
  • 83,634
  • 13
  • 201
  • 212
  • If this happens reliably when you're using `pmap` and never with `map`, then please say so in a comment / update to the question. – Michał Marczyk May 18 '13 at 17:06
  • Yes, "this happens reliably when you're using pmap and never with map" - this is true. – Display Name May 18 '13 at 17:16
  • Yes, that was it, I made stupid error and was too lazy to carefully read stack trace. WTF! I'm sorry for wasting your time. – Display Name May 18 '13 at 17:20
  • I thought that `catch` block is almost certainly reachable in any case, since the server sometimes gives bad responses. But I was wrong. – Display Name May 18 '13 at 17:23
  • No worries. Given that the problem reliably comes up with `pmap`, it may well be that the http client refuses to make connections in parallel; in that case, fixing the `max-trials` issue would only max the issue (there would always be retries). The default connection manager in clj-http is indeed the `BasicClientConnectionManager`, of which the documentation says that, while thread-safe, it is only meant to be used by one thread at a time. So while you can't put things in an inconsistent state by using it from multiple threads, you might not be able to make parallel requests with it. – Michał Marczyk May 18 '13 at 18:05
  • You can pass your own connection manager to `http/get`; there's `clj-http.conn-mgr/make-reusable-conn-manager` for making `PoolingClientConnectionManager` instances. Also, you may want to investigate the number of retries which actually happen, perhaps by having `load-body` increment a counter stored in an atom somewhere in some sort of debug mode (controlled by a debug mode switch somewhere, or simply commenting out the relevant call for a quick & dirty approach). This will let you confirm that you're actually achieving the parallel speedup you're hoping for. – Michał Marczyk May 18 '13 at 18:07
  • As a final note, `pmap` as currently implemented always spins up futures in blocks of 32. See my answer to an earlier SO question [here](http://stackoverflow.com/questions/9065148/pmap-and-thread-count/9065484#9065484) for details. – Michał Marczyk May 18 '13 at 18:14
  • Actually, about that `pmap` remark above: because of that issue, I think you'd likely be better off with a pooling connection manager in any case. Also, after a proper look at clj-http, a new `BasicClientConnectionManager` seems to be created for each request, so there should be no concurrency issue after all (but I've only glanced at the HttpClient code, so I'm not claiming to know for sure -- counting retries, as suggested above, should give you the answer). – Michał Marczyk May 18 '13 at 18:30
  • But then why the loading IS faster with `pmap` despite some extra retries? In my case, this is something about 1 second vs 8 seconds. – Display Name May 18 '13 at 18:40
  • Ah, ok. Guess separate `BasicClientConnectionManager` instances are fine then. Thanks for posting the timings! A pooled connection manager might be even faster (it maintains persistent connections to hosts and lets you control concurrency on per-host and per-connection-manager level), but I guess it might not be worth the hassle if you're already seeing a good speedup. – Michał Marczyk May 18 '13 at 18:53
  • I wish make it even faster, but for now 1-2 sec is OK, i'll need to finish other things before. And thanks again for answers! – Display Name May 18 '13 at 18:54
  • Hmm, forgot to mention - in this case, all is done in parallel, so maybe speedup is not because fetching data is faster, but because JSON parsing is faster. I'll need to test this too... – Display Name May 18 '13 at 18:57
  • But that may be not straightforward, because JSON parser can throw exceptions too (if given bogus data). And then data must be retrieved again. – Display Name May 18 '13 at 19:05
  • Good point! Well, if you're planning to add logging to your app, you could just do it now and read off causes for retries from the logs. Incidentally, I believe [Cheshire](https://github.com/dakrone/cheshire) is faster than data.json. (There's also my library -- [clj-lazy-json](https://github.com/michalmarczyk/clj-lazy-json) -- which uses [Jackson](http://jackson.codehaus.org/) under the hood, like Cheshire, but provides a different model of consuming JSON. I mention it for completeness, but I imagine Cheshire will be more useful for your purposes.) – Michał Marczyk May 18 '13 at 19:07