At a high level I understood that using a transducer does not create any intermediate data structures whereas a long chain of operations via ->>
does and thus the transducer method is more performant. This is proven out as true in one of my examples below. However when I add clojure.core.async/chan
to the mix I do not get the same performance improvement I expect. Clearly there is something that I don't understand and I would appreciate an explanation.
(ns dev
(:require [clojure.core.async :as async]
[criterium.core :as crit]))
;; Setup some toy data.
(def n 1e6)
(def data (repeat n "1"))
;; Reusable thread-last operation (the "slower" one).
(defn tx [x]
(->> x
(map #(Integer. %))
(map inc) (map inc) (map inc) (map inc) (map inc) (map inc)
(map inc) (map inc) (map inc) (map inc) (map inc)))
;; Reusable transducer (the "faster" one).
(def xf (comp
(map #(Integer. %))
(map inc) (map inc) (map inc) (map inc) (map inc) (map inc)
(map inc) (map inc) (map inc) (map inc) (map inc)))
;; For these first two I expect the second to be faster and it is.
(defn nested []
(last (tx data)))
(defn into-xf []
(last (into [] xf data)))
;; For the next two I again expect the second to be faster but it is NOT.
(defn chan-then-nested []
(let [c (async/chan n)]
(async/onto-chan! c data)
(->> c
(async/into [])
async/<!!
tx
last)))
(defn chan-xf []
(let [c (async/chan n xf)]
(async/onto-chan! c data)
(->> c
(async/into [])
async/<!!
last)))
(comment
(crit/quick-bench (nested)) ; 1787.672 ms
(crit/quick-bench (into-xf)) ; 822.8626 ms
(crit/quick-bench (chan-then-nested)) ; 1535.628 ms
(crit/quick-bench (chan-xf)) ; 2072.626 ms
;; Expected ranking fastest to slowest
;; into-xf
;; nested
;; chan-xf
;; chan-then-nested
;; Actual ranking fastest to slowest
;; into-xf
;; chan-then-nested
;; nested
;; chan-xf
)
In the end there are two results I don't understand. First, why is using a transducer with a channel slower than reading everything off the channel and then doing nested maps? It appears that the "overhead", or whatever it is, of using a transducer with a channel is so much slower that it overwhelms the gains of not creating intermediate data structures. Second, and this one was really unexpected, why is it faster to put the data onto a channel and then take it off and then use the nested map technique than it is to not do the channel dance and just use the nested map technique? (Said shorter, why is chan-then-nested
faster than nested
?) Could all or some of this just be an artifact of the benchmarking or randomness? (I have run quick-bench
several times for each of these with the same results.) I'm wondering if it has something to do with into
calling transduce
whereas in the channel version is not implemented the same way at all. The transducer provides the same interface for applying a transformation across vectors or channels but how that transformation is applied is different; and that difference makes all the difference.