Clojure partition by filter

Question

In Scala, the partition method splits a sequence into two separate sequences -- those for which the predicate is true and those for which it is false:

scala> List(1, 5, 2, 4, 6, 3, 7, 9, 0, 8).partition(_ % 2 == 0)
res1: (List[Int], List[Int]) = (List(2, 4, 6, 0, 8),List(1, 5, 3, 7, 9))

Note that the Scala implementation only traverses the sequence once.

In Clojure the partition-by function splits the sequence into multiple sub-sequences, each the longest subset that either does or does not meet the predicate:

user=> (partition-by #(= 0 (rem % 2)) [1, 5, 2, 4, 6, 3, 7, 9, 0, 8])
((1 5) (2 4 6) (3 7 9) (0 8))

while the split-by produces:

user=> (split-with #(= 0 (rem % 2)) [1, 5, 2, 4, 6, 3, 7, 9, 0, 8])
[() (1 5 2 4 6 3 7 9 0 8)]

Is there a built-in Clojure function that does the same thing as the Scala partition method?

I'm not actually sure why this is tagged "scala". You are using Scala to give an example of the functionality you want, but otherwise it's entirely a Clojure question. Scala expertise doesn't seem relevant. — Rex Kerr, Apr 14 '11 at 14:48
Why does it have to be built-in? Can't you just call filter twice? — Adrian Mouat, Apr 14 '11 at 14:47
That traverses the sequence twice -- not so good for a large sequences. — Ralph, Apr 14 '11 at 15:16
ah, ok. I'll leave this answer as it may still be of interest to others with similar problems. — Adrian Mouat, Apr 14 '11 at 16:14

A. Levy · Accepted Answer · 2019-01-11T16:00:37.533

I believe the function you are looking for is clojure.core/group-by. It returns a map of keys to lists of items in the original sequence for which the grouping function returns that key. If you use a true/false producing predicate, you will get the split that you are looking for.

user=> (group-by even? [1, 5, 2, 4, 6, 3, 7, 9, 0, 8])
{false [1 5 3 7 9], true [2 4 6 0 8]}

If you take a look at the implementation, it fulfills your requirement that it only use one pass. Plus, it uses transients under the hood so it should be faster than the other solutions posted thus far. One caveat is that you should be sure of the keys that your grouping function is producing. If it produces nil instead of false, then your map will list failing items under the nil key. If your grouping function produces non-nil values instead of true, then you could have passing values listed under multiple keys. Not a big problem, just be aware that you need to use a true/false producing predicate for your grouping function.

The nice thing about group-by is that it is more general than just splitting a sequence into passing and failing items. You can easily use this function to group your sequence into as many categories as you need. Very useful and flexible. That is probably why group-by is in clojure.core instead of separate.

score 4 · Answer 2 · answered Apr 14 '11 at 15:13

4

Part of clojure.contrib.seq-utils:

user> (use '[clojure.contrib.seq-utils :only [separate]])
nil                                                                                                                                                         
user> (separate even? [1, 5, 2, 4, 6, 3, 7, 9, 0, 8])
[(2 4 6 0 8) (1 5 3 7 9)]

answered Apr 14 '11 at 15:13

Jürgen Hötzel

18,997
3
42
58

1

OP seems pretty excited about traversing the sequence only once; `separate` goes through it twice: it's basically implemented as `((juxt filter remove) pred coll)`, except a little less concise. – amalloy Apr 14 '11 at 17:43

kotarak · Answer 3 · 2011-04-15T07:45:51.050

Please note that the answers of Jürgen, Adrian and Mikera all traverse the input sequence twice.

(defn single-pass-separate
  [pred coll]
  (reduce (fn [[yes no] item]
            (if (pred item)
              [(conj yes item) no]
              [yes (conj no item)]))
          [[] []]
          coll))

A single pass can only be eager. Lazy has to be two pass plus weakly holding onto the head.

Edit: lazy-single-pass-separate is possible but hard to understand. And in fact, I believe this is slower then a simple second pass. But I haven't checked that.

(defn lazy-single-pass-separate
  [pred coll]
  (let [coll       (atom coll)
        yes        (atom clojure.lang.PersistentQueue/EMPTY)
        no         (atom clojure.lang.PersistentQueue/EMPTY)
        fill-queue (fn [q]
                     (while (zero? (count @q))
                       (locking coll
                         (when (zero? (count @q))
                           (when-let [s (seq @coll)]
                             (let [fst (first s)]
                               (if (pred fst)
                                 (swap! yes conj fst)
                                 (swap! no conj fst))
                               (swap! coll rest)))))))
        queue      (fn queue [q]
                     (lazy-seq
                       (fill-queue q)
                       (when (pos? (count @q))
                         (let [item (peek @q)]
                           (swap! q pop)
                           (cons item (queue q))))))]
    [(queue yes) (queue no)]))

This is as lazy as you can get:

user=> (let [[y n] (lazy-single-pass-separate even? (report-seq))] (def yes y) (def no n))
#'user/no
user=> (first yes)
">0<"
0
user=> (second no)
">1<"
">2<"
">3<"
3
user=> (second yes)
2

Looking at the above, I'd say "go eager" or "go two pass."

@kotarak: In the paper [The Genuine Sieve of Eratosthenes - Melissa E. O’Neill](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf), she shows a nice lazy implementation of the sieve in Haskell that uses a priority queue to hold the "next" increment for the prime/composite table. I wonder if something like that could be used to keep two lazy sequences with a single pass through the original. — Ralph, Apr 14 '11 at 15:37
@kotarak:"A single pass can only be eager." -- I wonder if that can be proven. — Ralph, Apr 14 '11 at 16:27
@ralph I think the second traversal is not worth the trouble. But calling the predicate twice is questionable, because it might potentially be expensive. If that is ok, you can go with @amalloys implementation at his link. — kotarak, Apr 15 '11 at 07:50
@kotarak my solution only calls the predicate once per item. I'm not sure whether you're claiming that's what it does, but at any rate it doesn't. — amalloy, Apr 17 '11 at 07:06
@amalloy ??? Now I'm completely confused. Here is what I tried to say: your solution does call the predicate once per item, but traverses things twice: filter + remove. The above doesn't, but is so far-fetched, complicated and (most likely) inefficient, that I said: "if the problem is an expensive predicate, just use @amalloys solution. It calls the predicate once, and the second pass is probably much fast than this monster. If the second traversal is also a problem use the eager version." I'd probably tend to use yours. — kotarak, Apr 17 '11 at 07:15

mikera · Answer 4 · 2011-04-14T15:19:37.227

0

It's not hard to write something that does the trick:

(defn partition-2 [pred coll]
  ((juxt 
    (partial filter pred) 
    (partial filter (complement pred))) 
  coll))

(partition-2 even? (range 10))

=> [(0 2 4 6 8) (1 3 5 7 9)]

edited Apr 14 '11 at 15:19

answered Apr 14 '11 at 15:00

mikera

105,238
25
256
415

3

`((juxt filter remove) pred coll)` – amalloy Apr 14 '11 at 16:54
ah nice - Clojure always seems to have a trick for making things shorter! – mikera Apr 14 '11 at 19:53
@amalloy a blog on this that culminates with the (just filter remove) construction http://blog.jayfields.com/2011/08/clojure-partition-by-split-with-group.html – user7610 Oct 28 '14 at 22:17

amalloy · Answer 5 · 2011-04-15T19:55:48.483

Maybe see https://github.com/amalloy/clojure-useful/blob/master/src/useful.clj#L50 - whether it traverses the sequence twice depends on what you mean by "traverse the sequence".

Edit: Now that I'm not on my phone, I guess it's silly to link instead of paste:

(defn separate
  [pred coll]
  (let [coll (map (fn [x]
                    [x (pred x)])
                  coll)]
    (vec (map #(map first (% second coll))
              [filter remove]))))

Clojure partition by filter

5 Answers5