TL;DR: no, the two algorithms are not substantially different.
Your definition, primes = 2:(tail primes) ++ ....
says that head primes = 2
and head (tail primes) = head ((tail primes) ++ ....) = head (tail primes)
. And that's of course problematic, causes infinite recursion.
The smallest fix to your code while preserving its intent is probably
firstPrimes1 :: Int -> [Int]
firstPrimes1 1 = [2]
firstPrimes1 n = firstPrimes1 (n-1) ++
take 1 [x | x <- [3,4..],
and [ mod x k > 0 | k <- firstPrimes1 (n-1)]]
(this uses take 1 ...
in place of your [head ...]
).
It is unbelievably slow (looks exponential, or worse). But it should have been, of course,
firstPrimes2 1 = [2]
firstPrimes2 n = let { ps = firstPrimes2 (n-1) } in
ps ++
take 1 [x | x <- [3,4..],
and [ mod x k > 0 | k <- ps]]
which is now simply very slow, about cubic in time complexity. But it should have really been this, though:
firstPrimes2b 2 = [2]
firstPrimes2b n = let { ps = firstPrimes2b (n-1) } in
ps ++
take 1 [x | x <- [last ps+1..],
and [ mod x k > 0 | k <- ps]]
which now behaves as if quadratic, and indeed is yet much faster than its predecessor in concrete terms as well.
To structure it like the Fibonacci stream, it could be written as
primes3 = 2 : concatMap foo [1..]
where
foo k = let { ps = take k primes3 } in
take 1 [ x | x <- [last ps+1..],
and [ mod x k > 0 | k <- ps]]
-- or
primes4 = 2 : concatMap bar (tail (inits primes4))
where
bar ps = take 1 [ x | x <- [last ps+1..],
and [ mod x k > 0 | k <- ps]]
-- or even
primes5 = 2 : [p | (ps, q) <- zip (tail (inits primes5)) primes5
, p <- take 1 [ x | x <- [q+1..],
and [ mod x k > 0 | k <- ps]]]
Indeed it looks like it follows an inductive pattern, specifically that of complete aka "strong" induction, forall(n).(forall( k < n ).P(k)) => P(n)
.
So it is not fundamentally different from the Fibonacci calculation, although the latter refers only to the previous two elements whereas this one refers to all the previous elements while adding the new one. But just as the Fibonacci stream, this sequence too is defined ultimately in terms of itself: primes = ..... primes ......
.
The inits
makes bar
refer to the previously known primes ps
explicitly while adding one more to them at each step (expressed by take 1
), just like you wanted. concatMap
collects all the new one-element segments produced by each invocation of bar
.
But why should that be only one prime? Couldn't we safely produce more than one new prime, from the k
known previous primes? Must we really test the candidates by all the preceding primes, or can we use the well-known shortcut which you also mention in the question? Can we make it follow the pattern of complete prefix induction, forall(n).(forall( k < floor(sqrt(n)) ).P(k)) => P(n)
, so that only O(log log n) expansion steps are needed to get to the nth prime?
Could we be producing longer segments on each step from each prefix of the primes sequence (which sequence always stays the same, of course), thus referring not to all the preceding primes for each candidate, but only to a much smaller portion of them?...
True sieve of Eratosthenes' most direct expression in Haskell is
import qualified Data.List.Ordered as O (minus)
primes = map head $ scanl (O.minus) [2..] [[p,p+p..] | p <- primes]
(With its obvious semantics, minus
is easy to implement yourself, if not load from the data-ordlist package.)
Although Rev. S. Horsley, when he (re?-)introduced it in 1772,(*) described the sieve of Eratosthenes as the equivalent of
oprimes = map head $
scanl (O.minus . tail) [3,5..] [[p*p,p*p+2*p..] | p <- oprimes]
primes2 = 2 : oprimes
primesUpTo n = 2 : map head a ++ takeWhile (<= n) b
where
(a,b:_) = span ((<= n) . (^2) . head) $
scanl (O.minus . tail) [3,5..] [[p*p,p*p+2*p..] | p <- oprimes]
Running length $ primesUpTo n
is immensely faster than length . takeWhile (<= n) primes
. Can you see why?
Can you fix primes2
so it becomes as fast as primesUpTo
, in accessing its n
th element? It can follow your original thought, extending the known segment of primes, step by step, as alluded to in the previous section.
Also, do note that no isPrime
function is used here at all. Which is the hallmark of the true sieve of Eratosthenes, which does not test for primality, it generates the composites, and gets the primes between the composites, for free.
How the first scanl
code works: it starts with the sequence [2,3,4,5,...]
. Then it makes a notice to remove [2,4,6,8,...]
from it, and is left with the equivalent of [3,5,7,9,...]
i.e. coprimes({2}).
(This works, even though the lists are infinite, because Haskell has lazy evaluation -- only as much calculations are performed as demanded by the needs of performing the final output of the program.)
Then it makes a notice to remove from them the list [3,6,9,12,..]
, and is left with coprimes({2,3}).
At each stage it takes the head
off the sequence-at-that-point-in-time, and puts that head element aside, thus forming the resulting sequence of primes.
(The same could be coded with iterate
(or unfoldr
, etc.). It's a nice exercise, can help clarify what's going on there exactly. When you'll do this, you'll see you'll be re-creating the primes sequence as part of the arguments to the step function being iterated (the current sequence of the first k primes' coprimes, and the next, k+1-th prime, to remove its multiples from that sequence). The scanl
versions refer to the original sequence of primes explicitly, taking the primes from it one after another, but it's the same thing.)
The second scanl
variant only enumerates the prime's odd multiples, starting each enumeration from the prime's square (so, for e.g. 3 it's [9,15,21,27,...]
, and for 7 it's [49,63,77,91,...]
). It still starts that enumeration for each prime though, not for each prime's square; that's why it has to make special arrangements to stop as soon as it's okay for it to stop, in the primesUpTo
function. Which is the key to its efficiency.
(*) pg 314 of Philosophical Transactions, Vol.XIII.
see also: minus
defined and used here, or here.