10

Similar to this question: Inner-join in clojure

Is there a function for outer joins (left, right and full) performed on collections of maps in any of the Clojure libraries?

I guess it could be done by modifying the code of clojure.set/join but this seems as a common enough requirement, so it's worth to check if it already exists.

Something like this:

(def s1 #{{:a 1, :b 2, :c 3}
          {:a 2, :b 2}})

(def s2 #{{:a 2, :b 3, :c 5}
          {:a 3, :b 8}})


;=> (full-join s1 s2 {:a :a})
;
;   #{{:a 1, :b 2, :c 3}
;     {:a 2, :b 3, :c 5}
;     {:a 3, :b 8}}

And the appropriate functions for left and right outer join, i.e. including the entries where there is no value (or nil value) for the join key on the left, right or both sides.

Community
  • 1
  • 1
Goran Jovic
  • 9,418
  • 3
  • 43
  • 75

1 Answers1

5

Sean Devlin's (of Full Disclojure fame) table-utils has the following join types:

  • inner-join
  • left-outer-join
  • right-outer-join
  • full-outer-join
  • natural-join
  • cross-join

It hasn't been updated in a while, but works in 1.3, 1.4 and 1.5. To make it work without any external dependencies:

  • replace fn-tuple with juxt
  • replace the whole (:use ) clause in the ns declaration with (require [clojure.set :refer [intersection union]])
  • add the function map-vals from below:

either

(defn map-vals
  [f coll]
  (into {} (map (fn [[k v]] {k (f v)}) coll)))

or for Clojure 1.5 and up

(defn map-vals
  [f coll]
  (reduce-kv (fn [acc k v] (assoc acc k (f v))) {} coll))

Usage of the library is join type, two collections (two sets of maps like the example above, or two sql resultsets) and at least one join fn. Since keywords are functions on maps, usually only the join keys will suffice:

=> (full-outer-join s1 s2 :a :a)
   ({:a 1, :c 3, :b 2}
    {:a 2, :c 5, :b 3}
    {:b 8, :a 3})

If I remember correctly Sean tried to get table-utils into contrib some time ago, but that never worked out. Too bad it never got it's own project (on github/clojars). Every now and then a question for a library like this pops up on Stackoverflow or the Clojure Google group.

Another option might be using the datalog library from datomic to query clojure data structures. Stuart Halloway has some examples in his gists.

NielsK
  • 6,886
  • 1
  • 24
  • 46
  • 1
    Although I was initially reluctant to copy&paste code, this is what I eventually did to solve the problem. Thanks! – Goran Jovic Nov 09 '12 at 21:20
  • I had exactly the same hesitations, but having an SQL background these abstractions came naturally, so in the end I caved in. Perhaps I'll try get some time in to write such a library myself and push it to Clojars, but I'm not very familiar with github & Open Source stuff.. – NielsK Nov 10 '12 at 20:57
  • I tried with Clojure 1.6, after the above suggested fixes, it worked for the example above, but I have stack overflow error with 2 tables each with about 4,000 rows. Here is the error: "java.lang.StackOverflowError: null". I've already set my project with :jvm-opts ["-Xmx1024m"] please suggest any simple fix or alternative. Thanks! – Yu Shen Oct 02 '14 at 07:54
  • 1
    The stack overflow problem can be solved by replacing "reduce concat" by "apply concat" in the implementation of join-worker in the source code. – Yu Shen Oct 02 '14 at 12:57