Clojure: building collections using `for` bindings

Question

I'm still fairly new to clojure, but a pattern that I find myself using frequently in it goes something like this: I have some collections and I want to build a new collection, usually a hash-map, out of them with some filters or conditions. There are always a few ways to do this: using loop or using reduce combined with map/filter for example, but I would like to implement something more like the for macro, which has great syntax for controlling what gets evaluated in the loop. I'd like to produce a macro with syntax that goes like this:

(defmacro build
  "(build sym init-val [bindings...] expr) evaluates the given expression expr
   over the given bindings (treated identically to the bindings in a for macro); 
   the first time expr is evaluated the given symbol sym is bound to the init-val
   and every subsequent time to the previous expr. The return value is the result
   of the final expr. In essence, the build macro is to the reduce function
   as the for macro is to the map function.

   Example:
     (build m {} [x (range 4), y (range 4) :when (not= x y)]
       (assoc m x (conj (get m x #{}) y)))
      ;; ==> {0 #{1 3 2}, 1 #{0 3 2}, 2 #{0 1 3}, 3 #{0 1 2}}"
  [sym init-val [& bindings] expr]
  `(...))

Looking at the for code in clojure.core, it's pretty clear that I don't want to re-implement its syntax myself (even ignoring the ordinary perils of duplicating code), but coming up with for-like behavior in the above macro is a lot trickier than I initially expected. I eventually came up with the following, but I feel that (a) this probably isn't terribly performant and (b) there ought to be a better, still clojure-y, way to do this:

(defmacro build
  [sym init-val bindings expr]
  `(loop [result# ~init-val, s# (seq (for ~bindings (fn [~sym] ~expr)))]
     (if s#
       (recur ((first s#) result#) (next s#))
       result#))
   ;; or `(reduce #(%2 %1) ~init-val (for ~bindings (fn [~sym] ~expr)))

My specific questions:

Is there a built-in clojure method or library that solves this already, perhaps more elegantly?
Can someone who is more familiar with clojure performance give me an idea of whether this implementation is problematic and whether/how much I should be worried about performance, assuming that I may use this macro very frequently for relatively large collections?
Is there any good reason that I should use the loop over the reduce version of the macro above, or vice versa?
Can anyone see a better implementation of the macro?

score 1 · Accepted Answer · answered Nov 13 '15 at 14:34

1

Your reduce version was also my first approach based on the problem statement. I think it's nice and straightforward and I'd expect it to work very well, particularly since for will produce a chunked seq that reduce will be able to iterate over very quickly.

for generates functions to do output generation anyway and I wouldn't expect the extra layer introduced by the build expansion to be particularly problematic. It may still be worthwhile to benchmark this version based on volatile! as well:

(defmacro build [sym init-val bindings expr]
  `(let [box# (volatile! ~init-val)] ; AtomicReference would also work
     (doseq ~bindings
       (vreset! box# (let [~sym @box#] ~expr)))
     @box#))

Criterium is great for benchmarking and will eliminate any performance-related guesswork.

answered Nov 13 '15 at 14:34

Michał Marczyk

83,634
13
201
212

Thanks, this is very helpful; I'll check out Criterium! – nben Nov 13 '15 at 16:06
FYI, I used Criterium to benchmark the three versions of build (loop, reduce, volatile). The code I used was `(let [R (doall (repeatedly 10000 rand))] (bench (build m #{} [r R] (conj m r))))`. On my desktop, both the loop and reduce versions of build had an average runtime of ~3.8 ms, with the loop version being insignificantly faster in all tests (by about 10 us). The volatile version ran in ~3.1 ms. – nben Nov 13 '15 at 21:30
Cheers! I've run a similar benchmark with a vector instead of a lazy seq as input – so as to iterate over a chunked seq – and the `loop` version was clearly slower and `volatile!` slightly faster than `reduce` (8.68 ms / 7.41 ms / 7.27 ms for `loop` / `reduce` / `volatile!`). – Michał Marczyk Nov 15 '15 at 18:53

ClojureMostly · Answer 2 · 2015-11-13T14:44:31.237

0

I don't want to quite take your example code of your doc string since it's not idiomatic clojure. But taking plumbing.core's for-map, you can come up with a similar for-map-update:

(defn update!
  "Like update but for transients."
  ([m k f] (assoc! m k (f (get m k))))
  ([m k f x1] (assoc! m k (f (get m k) x1)))
  ([m k f x1 x2] (assoc! m k (f (get m k) x1 x2)))
  ([m k f x1 x2 & xs] (assoc! m k (apply f (get m k) x1 x2 xs))))

(defmacro for-map-update
  "Like 'for-map' for building maps but accepts a function as the value to build map values."
  ([seq-exprs key-expr val-expr]
   `(for-map-update ~(gensym "m") ~seq-exprs ~key-expr ~val-expr))
  ([m-sym seq-exprs key-expr val-expr]
   `(let [m-atom# (atom (transient {}))]
      (doseq ~seq-exprs
        (let [~m-sym @m-atom#]
          (reset! m-atom# (update! ~m-sym ~key-expr ~val-expr))))
      (persistent! @m-atom#))))

(for-map-update
  [x (range 4)
   y (range 4)
   :when (not= x y)]
  x (fnil #(conj % y) #{} ))
;; => {0 #{1 3 2}, 1 #{0 3 2}, 2 #{0 1 3}, 3 #{0 1 2}}

edited Nov 13 '15 at 14:44

answered Nov 13 '15 at 14:38

ClojureMostly

4,652
2
22
24

I disagree about it being unidiomatic. It is a custom macro, of course, but it is not at all out of line with Clojure idioms. In fact, the syntax is rather similar to `areduce`. – Michał Marczyk Nov 13 '15 at 15:07
Thanks for the plumbing.core link---been looking for something like this. Can you point me toward a definition of 'idiomatic clojure' or explain what makes a macro idiomatic? Particularly given that your version both requires much more code and supports only some of the functionality of mine (i.e., yours only reduces to maps and doesn't allow, for example, the equivalent of this (relatively common) kind of case: `(build m {} [x some-numbers] (if (even? x) (assoc m x true) (dissoc m (dec x))))`.) – nben Nov 13 '15 at 16:43
To me the `m` is like a mutable variable within a loop. Basically writing imperative code instead of functional. Something not even transients do (you have to capture the return value). That's something you see very very rarely in Clojure code. But yes I agree, your version is more powerful. – ClojureMostly Nov 13 '15 at 17:31
Gotcha, that makes a lot of sense. Though, if I understand you, I believe you are mistaken re:transients; the following works fine, for example: `(let [tr (transient {})] (assoc! tr :a 1) (persistent! tr))` -- yields `{:a 1}`. – nben Nov 13 '15 at 17:50
@user21382 Actually that doesn't work – if you `assoc!` a sufficiently large number of elements on to your transient map, its underlying type will change, `assoc!` will return a different transient instance and the original one will not contain your new entry. See for example http://stackoverflow.com/questions/29684803/why-inserting-1000-000-values-in-a-transient-map-in-clojure-yields-a-map-with-8 – Michał Marczyk Nov 13 '15 at 21:16
@Andre In that case the accumulator (left) argument to `f` in `(reduce f init xs)` could also be described as a mutable variable. (I don't think that would be accurate.) – Michał Marczyk Nov 13 '15 at 21:19

Clojure: building collections using `for` bindings

2 Answers2