Return a sequence with the elements not in common to two original sequences by using clojure

Question

I have two sequences, which can be vector or list. Now I want to return a sequence whose elements are not in common to the two sequences.

Here is an example:

(removedupl [1 2 3 4] [2 4 5 6]) = [1 3 5 6]
(removeddpl [] [1 2 3 4]) = [1 2 3 4]

I am pretty puzzled now. This is my code:

(defn remove-dupl [seq1 seq2]
    (loop [a seq1 b seq2]
        (if (not= (first a) (first b))
            (recur a (rest b)))))

But I don't know what to do next.

Are the inputs always going to be sorted as in the examples, or is that just coincidence? — Alex, Feb 03 '16 at 18:36

Timothy Pratley · Answer 1 · 2016-02-04T06:34:23.020

6

I encourage you to think about this problem in terms of set operations

(defn extrasection [& ss]
  (clojure.set/difference
    (apply clojure.set/union ss)
    (apply clojure.set/intersection ss)))

Such a formulation assumes that the inputs are sets.

(extrasection #{1 2 3 4} #{2 4 5 6})
=> #{1 6 3 5}

Which is easily achieved by calling the (set ...) function on lists, sequences, or vectors.

Even if you prefer to stick with a sequence oriented solution, keep in mind that searching both sequences is an O(n*n) task if you scan both sequences [unless they are sorted]. Sets can be constructed in one pass, and lookup is very fast. Checking for duplicates is an O(nlogn) task using a set.

edited Feb 04 '16 at 06:34

answered Feb 02 '16 at 23:36

Timothy Pratley

10,586
3
34
63

Nice one. I was looking about intersection but couldn't remember its name – m0skit0 Feb 02 '16 at 23:37
1

If both input sequences are sorted as in the example, it can be done in O(n) time without sets. Otherwise, sets are the way to go. – Alex Feb 03 '16 at 18:35

m0skit0 · Answer 2 · 2016-02-02T23:43:01.657

2

I'm still new to Clojure but I think the functional mindset is more into composing functions than actually doing it "by hand", so I propose the following solution:

(defn remove-dupl [seq1 seq2]
  (concat
    (remove #(some #{%} seq1) seq2)
    (remove #(some #{%} seq2) seq1)))

EDIT: I think it is better if we define that remove part as a local function and reuse it:

(defn remove-dupl [seq1 seq2]
  (let [removing (fn [x y] (remove #(some #{%} x) y))]
    (concat (removing seq1 seq2) (removing seq2 seq1))))

EDIT2: As commented by TimothyPratley

(defn remove-dupl [seq1 seq2]
  (let [removing (fn [x y] (remove (set x) y))]
    (concat (removing seq1 seq2) (removing seq2 seq1))))

edited Feb 02 '16 at 23:43

answered Feb 02 '16 at 23:19

m0skit0

25,268
11
79
127

1

Yes! It is easier to read the code after using let. Since I am new learner, would you please explain more this statement "(remove #(some #{%} x) y)". I looked up the reference, but I have no idea about this. Thank you so much! – Xiufen Xu Feb 02 '16 at 23:36
2

(remove (set seq1) seq2) is faster O(nlogn) instead of O(n^2) – Timothy Pratley Feb 02 '16 at 23:39
@TimothyPratley Thanks for the correction, will update. – m0skit0 Feb 02 '16 at 23:42
@XiufenXu Check [this](https://clojuredocs.org/clojure.core/some#example-542692d5c026201cdc32708a) – m0skit0 Feb 02 '16 at 23:44
@XiufenXu About the #() notation, check [here](http://stackoverflow.com/questions/13204993/anonymous-function-shorthand) – m0skit0 Feb 02 '16 at 23:49

Thumbnail · Answer 3 · 2016-02-03T23:22:26.610

There are several problems with your code.

It doesn't test for the end of either sequence argument.
It steps through b but not a.
It implicitly returns nil when any two sequences have the same first element.

You want to remove the common elements from the concatenated sequences. You have to work out the common elements first, otherwise you don't know what to remove. So ...

We use

clojure.set/intersection to find the common elements,
concat to stitch the collections together.
remove to remove (1) from (2).
vec to convert to a vector.

Thus

(defn removedupl [coll1 coll2]
  (let [common (clojure.set/intersection (set coll1) (set coll2))]
    (vec (remove common (concat coll1 coll2)))))

... which gives

(removedupl [1 2 3 4] [2 4 5 6]) ; [1 3 5 6]
(removedupl [] [1 2 3 4]) ; [1 2 3 4]

... as required.

Return a sequence with the elements not in common to two original sequences by using clojure

3 Answers3