1

I have a list of maps

 ( {:path "first" :size "1 gb"}  
   {:path "second" :size "500 mb"}
  ...)

and another list of maps

 ( {:path "first" :size "1 gb" :date "1"}
   {:path "second" :size "500 mb" :date "1"}
   {:path "first" :size "0.9 gb" :date "2"}...
   {:path "second" :size "400 mb" :date "2"}...
 ...)

I want to get the first list of maps transformed to something like

( {:path "first" :sizeon1 "1 gb" :sizeon2 "0.9 gb"...}
  {:path "second" :sizeon1 "500 mb" :sizeon2 "400 mb"...}
  ....)

I am a Clojure noob and having a hard time doing this. Can you please help me out?

  • in your example you don't really need first list, because it's data is duplicated in the second one. Is it always true, or are there cases where some path value present in first list and absent in the second or vice versa? if so, what is the expected result for these cases? – leetwinski Jul 20 '16 at 10:38
  • yes this is always true. U r right I just need a new list with the required result from the second list. The second list is sorted by date..ie all entries with date "1" first, then "2" and so on (edited).. How can it be done? Also, can you please see superkonduktr's answer and answer my comment? – user3083633 Jul 20 '16 at 10:57

3 Answers3

2

It all becomes clear when you break down your task into smaller parts.

First, define a helper to create those :sizeon1 keys in the result dataset:

(defn date-key
  [date]
  (keyword (str "sizeon" date)))

Next, you want to reduce a collection of single path data into an aggregated map, assuming such a collection looks as you described:

[{:path "first" :size "1 gb" :date "1"}
 {:path "first" :size "0.9 gb" :date "2"}
 ;; ...
 ]

reduce is just the tool for that:

(defn reduce-path
  [path-data]
  (reduce
    ;; A function that takes an accumulator map and an element in the collection
    ;; from which you take date and size and assoc them under the appropriate keys
    (fn [acc el]
      (let [{:keys [date size]} el]
        (assoc acc (date-key date) size)))
    ;; A starting value for the accumulator containing the common path
    ;; for this collection
    {:path (:path (first path-data))}
    ;; The collection of single path data to reduce
    path-data))

Finally, take the raw dataset containing different paths, partition it by path, and map the reduce-path function onto it.

(def data
  [{:path "first" :size "1 gb" :date "1"}
   {:path "first" :size "0.9 gb" :date "2"}
   {:path "second" :size "500 mb" :date "1"}
   {:path "second" :size "400 mb" :date "2"}])

(->> data
     (partition-by :path)
     (map reduce-path))

Note that this code assumes that your initial data collection is already sorted by :path. Otherwise, partition-by will not work as you would expect, and the data will have to be prepared accordingly.

superkonduktr
  • 645
  • 1
  • 10
  • 19
  • Thanks a lot! As you said partition-by wont work if collection is not sorted by :path my data is such that all the maps with :date "1" first ,then :date "2" and so on...Can you please tell me what to do in such a situation? – user3083633 Jul 20 '16 at 09:32
  • 1
    in this case you should replace `(partition-by :path)` line with two lines: `(group-by :path)` and `vals`. This has the same effect, as `group-by` groups items into a map, where values are collections similar to `partition-by` results – leetwinski Jul 20 '16 at 11:17
  • 1
    In addition to @leetwinski's suggestion, it's worth mentioning that your data lacks a sensible (ideally, numeric) field to perform sorting on. Then you could use `(sort-by :sortable-key collection)` to ensure the correct order before you feed the collection to the reducing function. Refer to https://clojuredocs.org/clojure.core/sort-by for some usage examples! – superkonduktr Jul 20 '16 at 11:29
  • I have created a function for that using sort-by. – user3083633 Jul 20 '16 at 12:32
2
(def data '({:path "first" :size "1 gb" :date "1"}
            {:path "second" :size "500 mb" :date "1"}
            {:path "first" :size "0.9 gb" :date "2"}
            {:path "second" :size "400 mb" :date "2"}))

(defn- reduce-group [g]
  (reduce (fn [acc m] (assoc acc
                             (keyword (str "sizeon" (:date m)))
                             (:size m)))
          (first g) g))

(let [groups (group-by :path data)]
  (map reduce-group (vals groups)))
Michiel Borkent
  • 34,228
  • 15
  • 86
  • 149
  • 1
    i guess it's wrong, because the op probably wanted to name keys according to their `:date` value (potentially there could be more. (`:size3` etc.) – leetwinski Jul 20 '16 at 10:34
2

what would i do, is to rethink the resulting data structure: I don't know about how would you potentially use the resulting collection, but naming keys :sizeonX, especially when there is potentially variable amount of registered dates or maybe some of them are missing (like for example if you have dates 1 and 3 for first path, and 1 2 3 5 for the second one) leads to a mess of unpredictably named keys in resulting maps, which would make it way more difficult when it comes to retrieving these keys. to me it looks like that it would be better to use this structure:

{:path "first" :sizes {"1" "500" "2" "1g" "10" "222"}}

so this sizes map is easily iterated and processed.

that is how would i do that:

(def data '({:path "first" :size "1 gb" :date "1"}
            {:path "first" :size "0.9 gb" :date "3"}
            {:path "second" :size "500 mb" :date "1"}
            {:path "second" :size "700 mb" :date "2"}
            {:path "second" :size "400 mb" :date "3"}
            {:path "second" :size "900 mb" :date "5"}))

(map (fn [[k v]] {:path k
                  :sizes (into {} (map (juxt :date :size) v))})
     (group-by :path data))

;; ({:path "first", :sizes {"1" "1 gb", "3" "0.9 gb"}} 
;;  {:path "second", :sizes {"1" "500 mb", 
;;                           "2" "700 mb", 
;;                           "3" "400 mb", 
;;                           "5" "900 mb"}})

update

but as you still need the structure from the question, i would do it like this:

(map (fn [[k v]]
       (into {:path k}
             (map #(vector (keyword (str "sizeon" (:date %)))
                           (:size %))
              v)))
     (group-by :path data))

;;({:path "first", :sizeon1 "1 gb", :sizeon3 "0.9 gb"} 
;; {:path "second", 
;;  :sizeon1 "500 mb", :sizeon2 "700 mb", 
;;  :sizeon3 "400 mb", :sizeon5 "900 mb"})

which is basically similar to @superkonduktr variant.

leetwinski
  • 17,408
  • 2
  • 18
  • 42
  • I am sorry but I'll need them in the form of keys as they are actually columns in a table. Is there a way to convert this into such a form? Or some other way? – user3083633 Jul 20 '16 at 11:17
  • Also 1 more thing, how can I reorder the output of this function so that it first has the key :path and then :sizeon1 , :sizeon2...and so on ? Currently it is displaying :path at the end. – user3083633 Jul 21 '16 at 06:02
  • 1
    that's because hash-maps are unordered. In general you can't predict the position of the key in map. However, why would you need to sort keys in this case at all? You can still use `sorted-map` for that, if there is real need (and i guess there's no) – leetwinski Jul 21 '16 at 06:13
  • so if you need it, replace `{:path k}` line with accumulator sorted-map: `(sorted-map-by #(cond (= :path %1) -1 (= :path %2) 1 :else (compare %1 %2)))` – leetwinski Jul 21 '16 at 06:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/117907/discussion-between-user3083633-and-leetwinski). – user3083633 Jul 21 '16 at 11:11