Clojure - StackOverflowError while iterating over lazy collection

Question

I am currently implementing solution for one of Project Euler problems, namely Sieve of Eratosthenes (https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes), in Clojure. Here's my code:

(defn cross-first-element [coll]
  (filter #(not (zero? (rem % (first coll)))) coll))

(println
  (last
  (map first
    (take-while
      (fn [[primes sieve]] (not (empty? sieve)))
      (iterate
        (fn [[primes sieve]] [(conj primes (first sieve)) (cross-first-element sieve)])
        [[] (range 2 2000001)])))))

The basic idea is to have two collections - primes already retrieved from the sieve, and the remaining sieve itself. We start with empty primes, and until the sieve is empty, we pick its first element and append it to primes, and then we cross out the multiples of it from the sieve. When it's exhausted, we know we have all prime numbers from below two millions in the primes.

Unfortunately, as good as it works for small upper bound of sieve (say 1000), it causes java.lang.StackOverflowError with a long stacktrace with repeating sequence of:

...
clojure.lang.RT.seq (RT.java:531)
clojure.core$seq__5387.invokeStatic (core.clj:137)
clojure.core$filter$fn__5878.invoke (core.clj:2809)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:51)
...

Where is the conceptual error in my solution? How to fix it?

Just a heads up, you may want to look into `->>`. It will make this code much cleaner: https://gist.github.com/carcigenicate/ec7147870b2398aa2ca598f5effdd74b — Carcigenicate, Apr 14 '19 at 21:26
@Carcigenicate thanks for pointing this out, I've been learning Clojure for a week and I still have much to learn :) — Michał Kaczanowicz, Apr 14 '19 at 21:36
Almost an *exact* dupe of this (https://stackoverflow.com/questions/29073273/lazy-seq-and-stack-overflow-for-infinite-sequences), although I'm not sure the fix will apply here. — Carcigenicate, Apr 14 '19 at 21:43
https://medium.com/@nikosfertakis/clojure-lazy-evaluation-and-stack-overflow-exceptions-1b8ee732ba0b — Carcigenicate, Apr 14 '19 at 21:46
It's been a while, but I've found an interesting article specifically about this mathematical problem in the context of functional programming :) Perhaps it will also help somebody else understand why this approach is wrong and so inefficient: [article](https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) — Michał Kaczanowicz, Sep 12 '19 at 12:23

leetwinski · Accepted Answer · 2019-04-15T12:13:50.923

the reason for this is the following: since the filter function in your cross-first-element is lazy, it doesn't actually filter your collection on every iterate step, rather it 'stacks' filter function calls. This leads to the situation that when you are going to actually need the resulting element, the whole load of test functions would be executed, roughly like this:

(#(not (zero? (rem % (first coll1))))
  (#(not (zero? (rem % (first coll2))))
    (#(not (zero? (rem % (first coll3))))
       ;; and 2000000 more calls

leading to stack overflow.

the simplest solution in your case is to make filtering eager. You can do it by simply using filterv instead of filter, or wrap it into (doall (filter ...

But still your solution is really slow. I would rather use loop and native arrays for that.

Alan Thompson · Answer 2 · 2019-04-15T06:03:36.857

You have (re-)discovered that having nested lazy sequences can sometimes be problematic. Here is one example of what can go wrong (it is non-intuitive).

If you don't mind using a library, the problem is much simpler with a single lazy wrapper around an imperative loop. That is what lazy-gen and yield give you (a la "generators" in Python):

(ns tst.demo.core
  (:use demo.core tupelo.test)
  (:require [tupelo.core :as t]))

(defn unprime? [primes-so-far candidate]
  (t/has-some? #(zero? (rem candidate %)) primes-so-far))

(defn primes-generator []
  (let [primes-so-far (atom [2])]
    (t/lazy-gen
      (t/yield 2)
      (doseq [candidate (drop 3 (range))] ; 3..inf
        (when-not (unprime? @primes-so-far candidate)
          (t/yield candidate)
          (swap! primes-so-far conj candidate))))))

(def primes (primes-generator))

(dotest
  (is= (take 33 primes)
    [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 ])

  ; first prime over 10,000
  (is= 10007 (first (drop-while #(< % 10000) primes)))

  ; the 10,000'th prime (https://primes.utm.edu/lists/small/10000.txt)
  (is= 104729 (nth primes 9999)) ; about 12 sec to compute
)

We could also use loop/recur to control the loop, but it's easier to read with an atom to hold the state.

Unless you really, really need a lazy & infinite solution, the imperative solution is so much simpler:

(defn primes-upto [limit]
  (let [primes-so-far (atom [2])]
    (doseq [candidate (t/thru 3 limit)]
      (when-not (unprime? @primes-so-far candidate)
        (swap! primes-so-far conj candidate)))
    @primes-so-far))

(dotest
  (is= (primes-upto 100)
    [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]) )

Clojure - StackOverflowError while iterating over lazy collection

2 Answers2