4

If I have a vector (def v [1 2 3]), I can replace the first element with (assoc v 0 666), obtaining [666 2 3]

But if I try to do the same after mapping over the vector:

(def v (map inc [1 2 3]))
(assoc v 0 666)

the following exception is thrown:

ClassCastException clojure.lang.LazySeq cannot be cast to clojure.lang.Associative

What's the most idiomatic way of editing or updating a single element of a lazy sequence?

Should I use map-indexed and alter only the index 0 or realize the lazy sequence into a vector and then edit it via assoc/update? The first has the advantage of maintaining the laziness, while the second is less efficient but maybe more obvious.

I guess for the first element I can also use drop and cons. Are there any other ways? I was not able to find any examples anywhere.

mrucci
  • 4,342
  • 3
  • 33
  • 35
  • What do you want to do, really? You could use map-indexed and change arbitrary values based on the position. Then it'd still be lazy. If you want to change some value in the middle of a lazy sequence, you first need to materialize the sequence. To change position 42 you first need to get to that point? – Benjamin Podszun Mar 22 '18 at 09:02

2 Answers2

5

What's the most idiomatic way of editing or updating a single element of a lazy sequence?

There's no built-in function for modifying a single element of a sequence/list, but map-indexed is probably the closest thing. It's not an efficient operation for lists. Assuming you don't need laziness, I'd pour the sequence into a vector, which is what mapv does i.e. (into [] (map f coll)). Depending on how you use your modified sequence, it may be just as performant to vectorize it and modify.

You could write a function using map-indexed to do something similar and lazy:

(defn assoc-seq [s i v]
  (map-indexed (fn [j x] (if (= i j) v x)) s))

Or if you want to do this work in one pass lazily without vector-izing, you can also use a transducer:

(sequence
  (comp
    (map inc)
    (map-indexed (fn [j x] (if (= 0 j) 666 x))))
  [1 2 3])

Realizing your use case is to only modify the first item in a lazy sequence, then you can do something simpler while preserving laziness:

(concat [666] (rest s))

Update re: comment on optimization: leetwinski's assoc-at function is ~8ms faster when updating the 500,000th element in a 1,000,000 element lazy sequence, so you should use his answer if you're looking to squeeze every bit of performance out of an inherently inefficient operation:

(def big-lazy (range 1e6))

(crit/bench
  (last (assoc-at big-lazy 500000 666)))
Evaluation count : 1080 in 60 samples of 18 calls.
            Execution time mean : 51.567317 ms
    Execution time std-deviation : 4.947684 ms
  Execution time lower quantile : 47.038877 ms ( 2.5%)
  Execution time upper quantile : 65.604790 ms (97.5%)
                  Overhead used : 1.662189 ns

Found 6 outliers in 60 samples (10.0000 %)
  low-severe     4 (6.6667 %)
  low-mild   2 (3.3333 %)
Variance from outliers : 68.6139 % Variance is severely inflated by outliers
=> nil

(crit/bench
  (last (assoc-seq big-lazy 500000 666)))
Evaluation count : 1140 in 60 samples of 19 calls.
            Execution time mean : 59.553335 ms
    Execution time std-deviation : 4.507430 ms
  Execution time lower quantile : 54.450115 ms ( 2.5%)
  Execution time upper quantile : 69.288104 ms (97.5%)
                  Overhead used : 1.662189 ns

Found 4 outliers in 60 samples (6.6667 %)
  low-severe     4 (6.6667 %)
Variance from outliers : 56.7865 % Variance is severely inflated by outliers
=> nil

The assoc-at version is 2-3x faster when updating the first item in a large lazy sequence, but it's no faster than (last (concat [666] (rest big-lazy))).

Taylor Wood
  • 15,886
  • 1
  • 20
  • 37
  • Thanks! That's exactly the answer I was looking for: a confirmation that there's no built-in function and some alternative implementations. I did non know about `mapv` for example, also the transducer version could be useful if the update is part of a longer chain of transformations. – mrucci Mar 22 '18 at 09:50
  • notice, that in case you have the collection of million items, and you need to modify the first (or middle item), you would still need to map over the whole collection when it would be realized. So it is totally suboptimal. – leetwinski Mar 22 '18 at 10:06
3

i would probably go with something generic like this, if this functionality is really needed (which i strongly doubt about):

(defn assoc-at [data i item]
  (if (associative? data)
    (assoc data i item)
    (if-not (neg? i)
      (letfn [(assoc-lazy [i data]
                (cond (zero? i) (cons item (rest data))
                      (empty? data) data
                      :else (lazy-seq (cons (first data)
                                            (assoc-lazy (dec i) (rest data))))))]
        (assoc-lazy i data))
      data)))

user> (assoc-at {:a 10} :b 20)
;; {:a 10, :b 20}

user> (assoc-at [1 2 3 4] 3 101)
;; [1 2 3 101]

user> (assoc-at (map inc [1 2 3 4]) 2 123)
;; (2 3 123 5)

another way is to use split-at:

(defn assoc-at [data i item]
  (if (neg? i)
    data
    (let [[l r] (split-at i data)]
      (if (seq r)
        (concat l [item] (rest r))
        data))))

notice that both this functions short circuit the coll traversal, which mapping approach doesn't. Here some quick and dirty benchmark:

(defn massoc-at [data i item]
  (if (neg? i)
    data
    (map-indexed (fn [j x] (if (== i j) item x)) data)))

(time (last (assoc-at (range 10000000) 0 1000)))
;;=> "Elapsed time: 747.921032 msecs"
9999999

(time (last (massoc-at (range 10000000) 0 1000)))
;;=> "Elapsed time: 1525.446511 msecs"
9999999
leetwinski
  • 17,408
  • 2
  • 18
  • 42
  • Why do you doubt it's needed? I have a lazy sequence representing a CSV file, for which I know I have to always "correct" the first row. If I can't alter the input CSV, what am I left with? – mrucci Mar 22 '18 at 09:46
  • well, i would say that your task here degrades to the altering the first item only. which is `(cons (do-something (first data)) (rest data))` . There are obviously other cases of altering the non associative collection's items, but it is almost always about updating item based on it's value, not index, so doing it with simple `map` is logical. – leetwinski Mar 22 '18 at 09:57