Find the elements of a LazySeq that have been realized

Question

I have a LazySeq of connections that are created when realized. If an exception occurs while attempting to create a connection, I'd like to iterate through all of the connections that have already been realized in the LazySeq and close them. Something like:

(try  
  (dorun connections)
  (catch ConnectException (close-connections connections)))

This doesn't quite work though since close-connections will attempt to realize the connections again. I only want to close connections that have been realized, not realize additional connections. Any ideas for doing this?

Michał Marczyk · Accepted Answer · 2013-12-03T14:42:29.110

Code:

This returns the previously realized initial fragment of the input seq as a vector:

(defn take-realized [xs]
  (letfn [(lazy-seq? [xs]
            (instance? clojure.lang.LazySeq xs))]
    (loop [xs  xs
           out []]
      (if (or (and (lazy-seq? xs) (not (realized? xs)))
              (and (not (lazy-seq? xs)) (empty? xs)))
        out
        (recur (rest xs) (conj out (first xs)))))))

Testing at the REPL:

(defn lazy-printer [n]
  (lazy-seq
   (when-not (zero? n)
     (println n)
     (cons n (lazy-printer (dec n))))))

(take-realized (lazy-printer 10))
;= []

(take-realized (let [xs (lazy-printer 10)] (dorun (take 1 xs)) xs))
;=> 10
;= [10]

;; range returns a lazy seq...
(take-realized (range 20))
;= []

;; ...wrapping a chunked seq
(take-realized (seq (range 40)))
;= [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
;   17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]

;; NB. *each* chunk of range gets its own LazySeq wrapper,
;; so that it is possible to work with infinite (or simply huge) ranges

(Using ;=> to indicate a printout.)

Discussion:

realized? is indeed the way to go, as suggested by Nathan. However, as I explained in my comments on Nathan's answer, one must also make sure that one doesn't inadvertently call seq on the one's input, as that would cause the previously-unrealized fragments of the input seq to become realized. That means that functions such as non-empty and empty? are out, since they are implemented in terms of seq.

(In fact, it is fundamentally impossible to tell whether a lazy seq is empty without realizing it.)

Also, while functions like lazify are useful for unchunking sequences, they do not prevent their underlying seqs from being realized in a chunked fashion; rather, they enable layers of processing (map, filter etc.) to operate in an unchunked fashion even while their original input seqs are chunked. There is in fact no connection at all between such "lazified" / "unchunked" seq being realized and its underlying, possibly chunked seq being realized. (In fact there is no way to establish such a connection in the presence of other observers of the input seq; absent other observers, it could be accomplished, but only at the cost of making lazify considerably more tedious to write.)

Good answer. Appears to work in the general sense, not just for the context laid out in the question, unlike my answer. I was hoping it would be possible to avoid both `loop`ing and `instance?`, but I guess both are pretty much necessary in this case. — Nathan Davis, Dec 03 '13 at 17:35
BTW, do you think it would be better to use `(instance? clojure.lang.IPending xs)` instead of `(instance? clojure.lang.LazySeq xs)`? That way, it would also work if someone decides to make their own version of `LazySeq` (as long as it implements `IPending`. Not sure why anyone would do that, but there are probably use cases if you stretch far enough. — Nathan Davis, Dec 03 '13 at 17:38
Cheers! @NathanDavis You're right, `IPending` is the "minimal type" and there's no reason not to use it. — Michał Marczyk, Dec 04 '13 at 04:25

score 3 · Answer 2 · edited May 23 '17 at 12:28

Update: While this answer will work for the context presented in the original question (running doall over a sequence, and determine which ones were realize if there was an exception), it contains several flaws and is unsuitable for the general use suggested by the question title. It does, however, present a theoretical (but flawed) basis that might help in understanding Michał Marczyk's answer. If you are having trouble understanding that answer, this answer might help by breaking things down a little more. It also illustrates several pitfalls you might encounter. But otherwise, just ignore this answer.

LazySeq implements IPending, so theoretically this should be as easy as iterating over successive tail sequences until realized? returns false:

(defn successive-tails [s]
  (take-while not-empty
              (iterate rest s)))

(defn take-realized [s]
  (map first
       (take-while realized?
                   (successive-tails s))))

Now, if you truly have a 100% LazySeq from start to finish, that's it -- take-realized will return the items of s that have been realized.

Edit: Ok, not really. This will work for determining which items were realized before an exception was thrown. However, as Michal Marcyzk points out, it will cause every item in the sequence to be realized in other contexts.

You can then write your cleanup logic like this:

(try  
  (dorun connections) ; or doall
  (catch ConnectException (close-connections (take-realized connections))))

However, be aware that a lot of Clojure's "lazy" constructs are not 100% lazy. For example, range will return a LazySeq, but if you start resting down it, it turns into a ChunkedCons. Unfortunately, ChunkedCons does not implement IPending, and calling realized? on one will throw an exception. To work around this, we can use lazy-seq to explicitly build a LazySeq that will stay a LazySeq for any sequence:

(defn lazify [s]
  (if (empty? s)
    nil
    (lazy-seq (cons (first s) (lazify (rest s))))))

Edit: As Michał Marczyk pointed out in a comment, lazify does not guarantee the underlying sequence is lazily consumed. In fact, it will probably realize previously unrealized items (but appears to only throw an exception the first time through). Its sole purpose is to guarantee that calling rest results in either nil or a LazySeq. In other words, it works well enough to run the example below, but YMMV.

Now if we use the same "lazified" sequence in both the dorun and the cleanup code, we will be able to use take-realize. Here's an example that illustrates how to build an expression that will return a partial sequence (the part before the failure) if an exception occurs while realizing it:

(let [v (for [i (lazify (range 100))]
          (if (= i 10)
            (throw (new RuntimeException "Boo!"))
            i))]
  (try
    (doall v)
    (catch Exception _ (take-realized v))))

This won't work, because (1) `not-empty` and `empty?` call `seq` on their arguments and will therefore realize the previously-unrealized parts of the lazy seqs; (2) `lazify` returns a seq which will be realized one element at a time, but of course its underlying seq will be realized in whichever fashion is usual for it, so the "lazified" seq can have only one link realized even while the underlying chunked seq has realized a full initial chunk. — Michał Marczyk, Dec 03 '13 at 14:21
Try `(defn lazy-printer [n] (lazy-seq (when-not (zero? n) (println n) (cons n (lazy-printer (dec n))))))`, `(take-realized (lazy-printer 10))`. This should print nothing, but in fact prints the integers 10 through 1. — Michał Marczyk, Dec 03 '13 at 14:21
@Michał_Marczyk thanks for the info. I suspected lazify was not as lazy as wanted it to be (after all it does call `empty?`, which must check if there's a first, right?). It's good (but annoying) to know my suspicion was correct ;-). However, in this context, `lazify`'s main purpose is to ensure a call to `rest` would always return a `LazySeq` (the sequence will always be fully realized in the normal case). The important thing is, if realizing one of the items throws an exception, we can still get at the items that didn't. I edited the answer to explain that. — Nathan Davis, Dec 03 '13 at 17:20
BTW, I thought this question would be a lot easier to answer than it actually was. But chunked sequences kept confounding me. I was elated when I found something that worked at all. Thank you for the feedback. — Nathan Davis, Dec 03 '13 at 17:28
@NathanDavis Just to make sure we're on the same page: `lazify` is able to serve this purpose here because the exception is thrown during processing of the lazified seq and not during the realization of its underlying seq. Were it thrown during the realization of the underlying chunked seq, it would still prevent us from getting at the elements "before the exception". To see this, replace `(range 100)` with `(filter (fn [x] (/ (- 10 x))) (range 100))`; clearly this throws at 10, but we're still prevented from accessing the initial elements through the lazified seq in the `catch` clause. — Michał Marczyk, Dec 04 '13 at 04:47
The key point though is that I totally agree that `lazify` is great when we're worried about throwing exceptions while processing an input seq. Just wanted to point out that if exceptions happen to be thrown while a chunked seq is being created, unchunking won't save us (naturally, as the unchunking function will not even get called). Actually, now that I'm reading your answer and comments more closely, I don't think you're saying anything to contradict this, so perhaps these comments are unnecessary... In any case, `lazify` certainly does work in this example and it's a good point to make. — Michał Marczyk, Dec 04 '13 at 04:51

Find the elements of a LazySeq that have been realized

2 Answers2

Code:

Discussion: