Newbie Problem Understanding Clojure Lazy Sequences

Question

I've just started learning Clojure and I'm puzzled by how lazy sequences work. In particular, I don't understand why these 2 expressions produce different results in the repl:

;; infinite range works OK
(user=> (take 3 (map #(/(- % 5)) (range)))
(-1/5 -1/4 -1/3)

;; finite range causes error
user=> (take 3 (map #(/(- % 5)) (range 1000)))
Error printing return value (ArithmeticException) at clojure.lang.Numbers/divide (Numbers.java:188).
Divide by zero

I take the sequence of integers (0 1 2 3 ...) and apply a function that subtracts 5 and then takes the reciprocal. Obviously this causes a division-by-zero error if it's applied to 5. But since I'm only taking the first 3 values from a lazy sequence I wasn't expecting to see an exception.

The results are what I expected when I use all the integers, but I get an error if I use the first 1000 integers.

Why are the results different?

Clojure will sometimes chunk operations on lazy sequences, as a optimization. Why Clojure is chunking one expression but not the other? I do not know. The important thing is that while using lazy sequences the code needs to be such that realizing more elements than asked for still results in correct behavior. — Shannon Severance, Jul 10 '21 at 17:27
Thanks Shannon, that certainly explains the behaviour I'm seeing, but I'm disappointed to discover that Clojure does this. It seems to violate referential transparency - the 2 expressions in my post should produce the same result IMO. It also means the docs are not being completely honest - the 'range' doc says it returns a lazy sequence, but actually the sequence is lazyish rather than lazy. — BillyBadBoy, Jul 10 '21 at 17:38
I agree it's confusing, but I don't think you can say it violates referential transparency. What expression could you swap for its value to produce a different result? The "problem" is that the values produced by `(range)` and `(range 999)` differ in more ways than expected, but you can still replace either of them with their value and get the same result. — amalloy, Jul 10 '21 at 18:49
@amalloy My problem is: how should I think about sequences with exceptions? If I think of sequences as fully lazy then both my expressions should work. On the other hand, if I think of sequences as fully evaluated (not literally) then both expressions should throw exceptions. Neither interpretation works. I guess the problem for me is more basic than referential transparency. My expressions don't even have deterministic values (since they may or may not throw exceptions depending on chunking). I guess the takeaway for me is to follow ShannonSeverance's advice and be careful. — BillyBadBoy, Jul 10 '21 at 21:38

score 3 · Accepted Answer · answered Jul 11 '21 at 01:23

Clojure 1.1 introduced "chunked" sequences,

This can provide greater efficiency ... Consumption of chunked-seqs as normal seqs should be completely transparent. However, note that some sequence processing will occur up to 32 elements at a time. This could matter to you if you are relying on full laziness to preclude the generation of any non-consumed results. [Section 2.3 of "Changes to Clojure in Version 1.1"]

In your example (range) seems to be producing a seq that realizes one element at a time and (range 999) is producing a chunked seq. map will consume a chunked seq a chunk at a time, producing a chunked seq. So when take asks for the first element of the chunked seq, function passed to map is called 32 times on the values 0 through 31.

I believe it is wisest to code in such a way the code will still work for any seq producing function/arity if that function produces a chunked seq with an arbitrarily large chunk.

I do not know if one writes a seq producing function that is not chunked if one can rely in current and future versions of library functions like map and filter to not convert the seq into a chunked seq.

But, why the difference? What are the implementation details such that (range) and (range 999) are different in the sort of seq produced?

Range is implemented in clojure.core.
(range) is defined as (iterate inc' 0).
Ultimately iterate's functionality is provided by the Iterate class in Iterate.java.
(range end) is defined, when end is a long, as (clojure.lang.LongRange/create end)
The LongRange class lives in LongRange.java.

Looking at the two java files it can be seen that the LongRange class implements IChunkedSeq and the Iterator class does not. (Exercise left for the reader.)

Speculation

The implementation of clojure.lang.Iterator does not chunk because iterator can be given a function of arbitrary complexity and the efficiency from chunking can easily be overwhelmed by computing more values than needed.
The implementation of (range) relies on iterator instead of a custom optimized Java class that does chunking because the (range) case is not believed to be common enough to warrant optimization.

I think I have a mental model now. An expression involving clojure seqs will produce the same result as a similar expression using fully-lazy seqs subject to the condition that realising unrequested elements has no observable side-effects (like throwing exceptions). So as long as I avoid side-effects (and exceptions) I can reason about seqs as if they were fully-lazy. My example expressions did not meet this condition, so divergent behaviour shouldn't have surprised me. Thanks to you and @amalloy for your help. — BillyBadBoy, Jul 11 '21 at 08:00

Newbie Problem Understanding Clojure Lazy Sequences

1 Answers1