Adding key- value pairs to maps in a list of maps from another list of maps in clojure

Question

I have a list of maps

 ( {:path "first" :size "1 gb"}  
   {:path "second" :size "500 mb"}
  ...)

and another list of maps

 ( {:path "first" :size "1 gb" :date "1"}
   {:path "second" :size "500 mb" :date "1"}
   {:path "first" :size "0.9 gb" :date "2"}...
   {:path "second" :size "400 mb" :date "2"}...
 ...)

I want to get the first list of maps transformed to something like

( {:path "first" :sizeon1 "1 gb" :sizeon2 "0.9 gb"...}
  {:path "second" :sizeon1 "500 mb" :sizeon2 "400 mb"...}
  ....)

I am a Clojure noob and having a hard time doing this. Can you please help me out?

in your example you don't really need first list, because it's data is duplicated in the second one. Is it always true, or are there cases where some path value present in first list and absent in the second or vice versa? if so, what is the expected result for these cases? — leetwinski, Jul 20 '16 at 10:38
yes this is always true. U r right I just need a new list with the required result from the second list. The second list is sorted by date..ie all entries with date "1" first, then "2" and so on (edited).. How can it be done? Also, can you please see superkonduktr's answer and answer my comment? — user3083633, Jul 20 '16 at 10:57

score 2 · Answer 1 · answered Jul 20 '16 at 08:15

It all becomes clear when you break down your task into smaller parts.

First, define a helper to create those :sizeon1 keys in the result dataset:

(defn date-key
  [date]
  (keyword (str "sizeon" date)))

Next, you want to reduce a collection of single path data into an aggregated map, assuming such a collection looks as you described:

[{:path "first" :size "1 gb" :date "1"}
 {:path "first" :size "0.9 gb" :date "2"}
 ;; ...
 ]

reduce is just the tool for that:

(defn reduce-path
  [path-data]
  (reduce
    ;; A function that takes an accumulator map and an element in the collection
    ;; from which you take date and size and assoc them under the appropriate keys
    (fn [acc el]
      (let [{:keys [date size]} el]
        (assoc acc (date-key date) size)))
    ;; A starting value for the accumulator containing the common path
    ;; for this collection
    {:path (:path (first path-data))}
    ;; The collection of single path data to reduce
    path-data))

Finally, take the raw dataset containing different paths, partition it by path, and map the reduce-path function onto it.

(def data
  [{:path "first" :size "1 gb" :date "1"}
   {:path "first" :size "0.9 gb" :date "2"}
   {:path "second" :size "500 mb" :date "1"}
   {:path "second" :size "400 mb" :date "2"}])

(->> data
     (partition-by :path)
     (map reduce-path))

Note that this code assumes that your initial data collection is already sorted by :path. Otherwise, partition-by will not work as you would expect, and the data will have to be prepared accordingly.

Thanks a lot! As you said partition-by wont work if collection is not sorted by :path my data is such that all the maps with :date "1" first ,then :date "2" and so on...Can you please tell me what to do in such a situation? — user3083633, Jul 20 '16 at 09:32
in this case you should replace `(partition-by :path)` line with two lines: `(group-by :path)` and `vals`. This has the same effect, as `group-by` groups items into a map, where values are collections similar to `partition-by` results — leetwinski, Jul 20 '16 at 11:17
In addition to @leetwinski's suggestion, it's worth mentioning that your data lacks a sensible (ideally, numeric) field to perform sorting on. Then you could use `(sort-by :sortable-key collection)` to ensure the correct order before you feed the collection to the reducing function. Refer to https://clojuredocs.org/clojure.core/sort-by for some usage examples! — superkonduktr, Jul 20 '16 at 11:29

Michiel Borkent · Answer 2 · 2016-07-20T16:34:01.200

2

(def data '({:path "first" :size "1 gb" :date "1"}
            {:path "second" :size "500 mb" :date "1"}
            {:path "first" :size "0.9 gb" :date "2"}
            {:path "second" :size "400 mb" :date "2"}))

(defn- reduce-group [g]
  (reduce (fn [acc m] (assoc acc
                             (keyword (str "sizeon" (:date m)))
                             (:size m)))
          (first g) g))

(let [groups (group-by :path data)]
  (map reduce-group (vals groups)))

edited Jul 20 '16 at 16:34

answered Jul 20 '16 at 08:27

Michiel Borkent

34,228
15
86
149

1

i guess it's wrong, because the op probably wanted to name keys according to their `:date` value (potentially there could be more. (`:size3` etc.) – leetwinski Jul 20 '16 at 10:34

leetwinski · Accepted Answer · 2016-07-20T11:47:06.983

what would i do, is to rethink the resulting data structure: I don't know about how would you potentially use the resulting collection, but naming keys :sizeonX, especially when there is potentially variable amount of registered dates or maybe some of them are missing (like for example if you have dates 1 and 3 for first path, and 1 2 3 5 for the second one) leads to a mess of unpredictably named keys in resulting maps, which would make it way more difficult when it comes to retrieving these keys. to me it looks like that it would be better to use this structure:

{:path "first" :sizes {"1" "500" "2" "1g" "10" "222"}}

so this sizes map is easily iterated and processed.

that is how would i do that:

(def data '({:path "first" :size "1 gb" :date "1"}
            {:path "first" :size "0.9 gb" :date "3"}
            {:path "second" :size "500 mb" :date "1"}
            {:path "second" :size "700 mb" :date "2"}
            {:path "second" :size "400 mb" :date "3"}
            {:path "second" :size "900 mb" :date "5"}))

(map (fn [[k v]] {:path k
                  :sizes (into {} (map (juxt :date :size) v))})
     (group-by :path data))

;; ({:path "first", :sizes {"1" "1 gb", "3" "0.9 gb"}} 
;;  {:path "second", :sizes {"1" "500 mb", 
;;                           "2" "700 mb", 
;;                           "3" "400 mb", 
;;                           "5" "900 mb"}})

update

but as you still need the structure from the question, i would do it like this:

(map (fn [[k v]]
       (into {:path k}
             (map #(vector (keyword (str "sizeon" (:date %)))
                           (:size %))
              v)))
     (group-by :path data))

;;({:path "first", :sizeon1 "1 gb", :sizeon3 "0.9 gb"} 
;; {:path "second", 
;;  :sizeon1 "500 mb", :sizeon2 "700 mb", 
;;  :sizeon3 "400 mb", :sizeon5 "900 mb"})

which is basically similar to @superkonduktr variant.

I am sorry but I'll need them in the form of keys as they are actually columns in a table. Is there a way to convert this into such a form? Or some other way? — user3083633, Jul 20 '16 at 11:17
Also 1 more thing, how can I reorder the output of this function so that it first has the key :path and then :sizeon1 , :sizeon2...and so on ? Currently it is displaying :path at the end. — user3083633, Jul 21 '16 at 06:02
that's because hash-maps are unordered. In general you can't predict the position of the key in map. However, why would you need to sort keys in this case at all? You can still use `sorted-map` for that, if there is real need (and i guess there's no) — leetwinski, Jul 21 '16 at 06:13
so if you need it, replace `{:path k}` line with accumulator sorted-map: `(sorted-map-by #(cond (= :path %1) -1 (= :path %2) 1 :else (compare %1 %2)))` — leetwinski, Jul 21 '16 at 06:18
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/117907/discussion-between-user3083633-and-leetwinski). — user3083633, Jul 21 '16 at 11:11

Adding key- value pairs to maps in a list of maps from another list of maps in clojure

3 Answers3