Generating finite lists of primes in Haskell

Question

There are a lot topics on generating prime numbers in Haskell, but in my opinion, they all rely on 'isPrime' function, which, if we don't know the primes sequence yet, should look like:

isPrime k = if k > 1 then null [ x | x <- [2,3..(div k 2) + 1], k `mod` x == 0]
                     else False

(div might be replaced with sqrt, but still...)

I've tried to construct prime numbers based on 'inductive definition' (assume we have a set of first n primes, then (n+1)th prime is the least integer such that none of the first n primes is a divisor of it). I've tried to do it in the Fibonacci sequence way, which is:

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fibs !! n
    where fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

And I ended up with this:

-- checking if second number is a divisor of first one
ifDoesn'tDivide :: Int -> Int -> Bool
ifDoesn'tDivide n k 
    | mod n k == 0 = False
    | otherwise    = True

-- generating list which consists of first n prime numbers
firstPrimes :: Int -> [Int]
-- firstPrimes 1  = [2]
firstPrimes n     = take n primes 
    where primes = 2:(tail primes) ++ 
         [head [x | x <- [3,4..], k <- primes, ifDoesn'tDivide x k == True]]

But it doesn't work, stack overflow when n >= 2. Any advice on how to fix it?

"Haskell can define data structures in terms of themselves in effect creating infinite data structures". Those prime numbers and Fibonacci sequences mentioned earlier are specific cases of defining data structures in terms of themselves, and Fibonacci sequence works just fine, but these primes doesn't.

Am I missing something, are those two algorithms different in a substantive way?

P.S. So, I think, I'm just looking for most 'Haskellish' way to do that.

``ifDoesn'tDivide n k = n`mod`k/=0``. Generally speaking, there's seldom any reason to explicitly mention `True` or `False`. — Anyway, so... _what's your question?_ — leftaroundabout, Aug 26 '20 at 13:11
It doesn't work. Stack overflow if n>=2. Looking for advice on how to fix it. — FoxZ322, Aug 26 '20 at 14:13
If `primes` is supposed to be an infinite list, then `tail primes` is also an infinite list. Appending something to an infinite list ain't gonna work. — n. m. could be an AI, Aug 26 '20 at 14:34
@n.'pronouns'm. but we don't know whether it's suppose to be infinite or not, and neither does Haskell. Haskell knows nothing in fact, except that which we told it, with `primes = 2 : tail primes ++ expr`, i.e. that `head primes =: 2` and `tail primes =: tail primes`. — Will Ness, Aug 26 '20 at 17:38
@WillNess I am not saying anything about what Haskell knows. I am talking about what the programmer thinks. Does the programmer think it should be infinite? If so, then ++ing anything to it is probably a bad idea. — n. m. could be an AI, Aug 26 '20 at 18:26

score 9 · Answer 1 · edited Aug 30 '20 at 22:54

9

You can always use a sieve which is rather elegant in Haskell.

primes = sieve [2..]

sieve (p : xs) = p : sieve [ x | x <- xs, x `mod` p > 0 ]

So to get the first 10 primes

> take 10 primes
[2,3,5,7,11,13,17,19,23,29]

Notice that while isPrime is not explicitly used the list comprehension ensures that every number on the list must be prime relative to all the primes preceding it, which is to say prime.

~~This is more efficient and it's at the heart of Eratosthenes' sieve~~(Edit).

The code above is the first example in:

Melissa E. O'Neill, The Genuine Sieve of Eratosthenes

The paper goes in to much more detail about the efficient implementation of sieves in Haskell and the role of laziness in the computation. Highly recommended!

edited Aug 30 '20 at 22:54

Will Ness

70,110
9
98
181

answered Aug 26 '20 at 14:31

Mihalis

356
1
2
5

1

I believe I have read that `rem` is more efficient than `mod`, and for positive numbers produces the same result. – A. R. Aug 26 '20 at 17:20
1

"it's at the heart of Eratosthenes' sieve" no, in all likelihood all he did was *counting*, never *dividing* any numbers. Rev. S. Horsley F.R.S. was *quite adamant* about it, in 1772, when he (re)introduced the sieve of Eratosthenes into modernity. "with little labour" certainly precludes any repeated modulus calculations. – Will Ness Aug 26 '20 at 17:27
1

also, the main drive of that article is to show that that code at the top of this post is *not* the sieve *of Eratosthenes*. then it derives its complexity as being *quadratic* in number of primes produced with it, whereas the true sieve's is just (log log n) above linearithmic. – Will Ness Aug 27 '20 at 11:57
@WillNess Thanks for pointing this out. I didn't mean to abuse the concept. Just tried to emphasize the difference in determining primality. Nonetheless, valid criticism. – Mihalis Aug 28 '20 at 17:27
[no criticism, just a correction](http://www.dreamsongs.com/10ideas.html). BTW "This is more efficient" also doesn't make sense to me. more efficient than what? than using the `isPrime` function? it isn't; the sieve code you include is very inefficient (quadratic), which, again, is the central starting point of that article. using *optimal* `isPrime` function makes the overall code "half-quadratic" (about n^1.5) and much much faster than this sieve. also, using that link to describe that code can still be misleading: it is a sieve, yes, but not of Eratosthenes. :) (I've edited, hope it's OK) – Will Ness Aug 30 '20 at 22:51

K. A. Buhr · Answer 2 · 2020-08-26T18:14:03.110

The core of your solution attempt is the infinite list of primes given by:

primes = 2:(tail primes)
  ++ [head [x | x <- [3,4..]
              , k <- primes
              , ifDoesn'tDivide x k == True]]

Update: You mentioned in a comment that you were thinking of this algorithm imperatively, so you were imagining that Haskell would use a "current" value of tail primes that was still empty in order to evaluate something like [2] ++ [] ++ [3] and then loop. But, of course, Haskell isn't imperative and so doesn't work like this. In Haskell, primes has one, fixed definition that stays the same throughout the execution of the program. A Haskell program can "discover" (or more accurately "compute") the definition gradually, which allows us to define primes in terms of itself in the first place, but it can't change the definition over the course of execution.

So, in looking at this definition, you need to imagine that primes and consequently tail primes have the same value everywhere they appear, even when being used recursively. This is different than a typical recursive function that takes an argument:

fact 0 = 1
fact n = n * fact (n-1)

Here, even though the function fact has the same definition everywhere it appears, the value of fact n on the left-hand side and the value of fact (n-1) on the right-hand side can be different, thanks to the different argument.

Anyway, if we look at this primes definition with the idea that we need primes to be the infinite list of all primes everywhere it appears (and not with a value that changes or "grows" over time), then you can see why this definition won't work. Here, primes is defined as 2 : tail primes ++ [expr] for a complicated expr that does all the real work, but tail primes is supposed to be infinite, so in evaluating this expression you'll never even get to expr, because you'll never exhaust the list tail primes.

Even ignoring the ++ [expr] bit, because primes has a single fixed definition, an expression like:

primes = 2 : tail primes

isn't the right way to define an infinite list in terms of itself. The problem is that the second element of primes is defined to be the first element of tail primes, which is the second element of primes, so the second element of primes is defined as itself. That will create an infinite loop when Haskell tries to "discover"/"compute" its value. The key to the fibs definition:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

is that the first and second elements are given, and then the third element is defined as the sum of the first two elements -- it isn't defined in terms of itself, but rather, it's defined in terms of earlier list elements. That's the key to a successful recursive definition of an infinite (or even finite) list.

Note that Haskell doesn't "know" that fib is an infinite list and doesn't do anything special for infinite lists. This would work the same way with recursively defined finite lists:

countdown = 10 : takeWhile (> 0) (map (subtract 1) countdown)

The key again is that each element in countdown is defined in such a way that it depends only on the previous element of countdown.

To modify your primes definition to work this way, what you probably want to do is generalize your list comprehension from getting the next prime after "2" to get the "next" prime after any current prime p, based on having primes available:

primeAfter p = head [x | x <- [p+1..], k <- primes, ifDoesn'tDivide x k]

This won't work for two reasons. First, because primes is infinite, this will keep checking divisibility by different k values forever. We need to modify it to check primes k only up to the current prime p:

primeAfter p = head [x | x <- [p+1..]
                       , k <- takeUntil (==p) primes
                       , ifDoesn'tDivide x k]

This uses a helper that takes the beginning of a list up until the first element where a predicate is true:

takeUntil p lst = case break p lst of (a,y:b) -> a ++ [y]

Second, the check is structured incorrectly. This list comprehension will allow through an x if there's any prime k that doesn't divide it. We need to let through an x only if all primes k don't divide it:

primeAfter p = head [x | x <- [p+1..]
                       , and [ifDoesn'tDivide x k
                               | k <- takeWhile (<=p) primes]]

then it has a chance of working, and we can define primes as:

primes = go 2
  where go p = p : go (primeAfter p)

Here, go adds the current prime to the list and then recurses to the next prime, using primeAfter. This works because even though primeAfter p accesses the infinite list primes being generated by the recursive go call, it only uses that list up to the current prime p, so it stops just before trying to access its own value in the list, only using primes generated before the call the primeAfter p.

So, this works, and I think is much in the spirit of your original attempt:

-- note this simplified version:
ifDoesn'tDivide :: Int -> Int -> Bool
ifDoesn'tDivide n k = mod n k /= 0

primes :: [Int]
primes = go 2
  where go p = p : go (primeAfter p)

primeAfter :: Int -> Int
primeAfter p = head [x | x <- [p+1..]
                       , and [ifDoesn'tDivide x k
                               | k <- takeUntil (==p) primes]]

takeUntil :: (a -> Bool) -> [a] -> [a]
takeUntil p lst = case break p lst of (a,y:b) -> a ++ [y]

main :: IO ()
main = do
  print $ take 10 primes

As @Mihalis has pointed out, primes is a pretty standard example in Haskell, so there are more elegant, one-line solutions to this, too.

So, this definition of the Fibonacchi sequence is realization of generating function concept and this approach is suitable here because the origin of Fibonacchi sequence is recursive? You've clearified that alghorithm I used creates an infinite loop, thus we won't evaluate even 2nd prime number. How does Haskell understand that my data stracture supposed to be infinite (like, because of lentgh primes equal length primes + 1?). Because, iI still think of this alghorithm in imperative way, and I thought that first step would be like this: ```primes = [2] ++ [ ] ++ [3]``` — FoxZ322, Aug 26 '20 at 17:21
@FoxZ322 Haskell _doesn't_ know that the calculation is infinite. In fact, Haskell _can't_ know that (it's the unsolvable Halting Problem). So it just goes into an infinite loop trying to generate the second element until it fills up its allotted memory and throws the stack overflow error as you observed. — A. R., Aug 26 '20 at 17:26
@FoxZ322, I tried adding some explanation to address your comment. — K. A. Buhr, Aug 26 '20 at 18:15

Will Ness · Answer 3 · 2020-09-10T07:23:21.900

TL;DR: no, the two algorithms are not substantially different.

Your definition, primes = 2:(tail primes) ++ .... says that head primes = 2 and head (tail primes) = head ((tail primes) ++ ....) = head (tail primes). And that's of course problematic, causes infinite recursion.

The smallest fix to your code while preserving its intent is probably

firstPrimes1 :: Int -> [Int]
firstPrimes1 1  = [2]
firstPrimes1 n  = firstPrimes1 (n-1) ++ 
         take 1 [x | x <- [3,4..], 
                     and [ mod x k > 0 | k <- firstPrimes1 (n-1)]]

(this uses take 1 ... in place of your [head ...]).

It is unbelievably slow (looks exponential, or worse). But it should have been, of course,

firstPrimes2 1  = [2]
firstPrimes2 n  = let { ps = firstPrimes2 (n-1) } in
       ps ++ 
         take 1 [x | x <- [3,4..], 
                     and [ mod x k > 0 | k <- ps]]

which is now simply very slow, about cubic in time complexity. But it should have really been this, though:

firstPrimes2b 2  = [2]
firstPrimes2b n  = let { ps = firstPrimes2b (n-1) } in
       ps ++ 
         take 1 [x | x <- [last ps+1..], 
                     and [ mod x k > 0 | k <- ps]]

which now behaves as if quadratic, and indeed is yet much faster than its predecessor in concrete terms as well.

To structure it like the Fibonacci stream, it could be written as

primes3 = 2 : concatMap foo [1..]
  where
  foo k = let { ps = take k primes3 } in
          take 1 [ x | x <- [last ps+1..], 
                       and [ mod x k > 0 | k <- ps]]
-- or 
primes4 = 2 : concatMap bar (tail (inits primes4))
  where
  bar ps = take 1 [ x | x <- [last ps+1..], 
                        and [ mod x k > 0 | k <- ps]]
-- or even 
primes5 = 2 : [p | (ps, q) <- zip (tail (inits primes5)) primes5
                 , p <- take 1 [ x | x <- [q+1..], 
                                     and [ mod x k > 0 | k <- ps]]]

Indeed it looks like it follows an inductive pattern, specifically that of complete aka "strong" induction, forall(n).(forall( k < n ).P(k)) => P(n).

So it is not fundamentally different from the Fibonacci calculation, although the latter refers only to the previous two elements whereas this one refers to all the previous elements while adding the new one. But just as the Fibonacci stream, this sequence too is defined ultimately in terms of itself: primes = ..... primes .......

The inits makes bar refer to the previously known primes ps explicitly while adding one more to them at each step (expressed by take 1), just like you wanted. concatMap collects all the new one-element segments produced by each invocation of bar.

But why should that be only one prime? Couldn't we safely produce more than one new prime, from the k known previous primes? Must we really test the candidates by all the preceding primes, or can we use the well-known shortcut which you also mention in the question? Can we make it follow the pattern of complete prefix induction, forall(n).(forall( k < floor(sqrt(n)) ).P(k)) => P(n), so that only O(log log n) expansion steps are needed to get to the nth prime?

Could we be producing longer segments on each step from each prefix of the primes sequence (which sequence always stays the same, of course), thus referring not to all the preceding primes for each candidate, but only to a much smaller portion of them?...

True sieve of Eratosthenes' most direct expression in Haskell is

import qualified Data.List.Ordered as O (minus)

primes = map head $ scanl (O.minus) [2..] [[p,p+p..] | p <- primes]

(With its obvious semantics, minus is easy to implement yourself, if not load from the data-ordlist package.)

Although Rev. S. Horsley, when he (re?-)introduced it in 1772,^(*) described the sieve of Eratosthenes as the equivalent of

oprimes = map head $ 
       scanl (O.minus . tail) [3,5..] [[p*p,p*p+2*p..] | p <- oprimes]

primes2 = 2 : oprimes

primesUpTo n = 2 : map head a ++ takeWhile (<= n) b
   where
   (a,b:_) = span ((<= n) . (^2) . head) $
       scanl (O.minus . tail) [3,5..] [[p*p,p*p+2*p..] | p <- oprimes]

Running length $ primesUpTo n is immensely faster than length . takeWhile (<= n) primes. Can you see why?

Can you fix primes2 so it becomes as fast as primesUpTo, in accessing its nth element? It can follow your original thought, extending the known segment of primes, step by step, as alluded to in the previous section.

Also, do note that no isPrime function is used here at all. Which is the hallmark of the true sieve of Eratosthenes, which does not test for primality, it generates the composites, and gets the primes between the composites, for free.

How the first scanl code works: it starts with the sequence [2,3,4,5,...]. Then it makes a notice to remove [2,4,6,8,...] from it, and is left with the equivalent of [3,5,7,9,...] i.e. coprimes({2}).

^{(This works, even though the lists are infinite, because Haskell has lazy evaluation -- only as much calculations are performed as demanded by the needs of performing the final output of the program.)}

Then it makes a notice to remove from them the list [3,6,9,12,..], and is left with coprimes({2,3}).

At each stage it takes the head off the sequence-at-that-point-in-time, and puts that head element aside, thus forming the resulting sequence of primes.

^{(The same could be coded with iterate (or unfoldr, etc.). It's a nice exercise, can help clarify what's going on there exactly. When you'll do this, you'll see you'll be re-creating the primes sequence as part of the arguments to the step function being iterated (the current sequence of the first k primes' coprimes, and the next, k+1-th prime, to remove its multiples from that sequence). The scanl versions refer to the original sequence of primes explicitly, taking the primes from it one after another, but it's the same thing.)}

The second scanl variant only enumerates the prime's odd multiples, starting each enumeration from the prime's square (so, for e.g. 3 it's [9,15,21,27,...], and for 7 it's [49,63,77,91,...]). It still starts that enumeration for each prime though, not for each prime's square; that's why it has to make special arrangements to stop as soon as it's okay for it to stop, in the primesUpTo function. Which is the key to its efficiency.

^(*) pg 314 of Philosophical Transactions, Vol.XIII.

see also: minus defined and used here, or here.

I still have some questions: `primes = map head $ scanl (O.minus) [2..] [[p,p+p..] | p <- primes]` as @K. A. Buhr mentioned, in Haskell primes sequence must have same value everywhere it appear, so it shouldn't work (and it doesn't work, at least in my case). But i have question on that `(minus) [2..] [p, p+p..]` for any p: we are crossing off all multiples of p, but list [2..] is infinite. I still think it is what was called unfaithful sieve of Eratosthenes' in the The Genuine Sieve of Eratosthenes article @Mihalis mentioned. Also both `primes` and `primesUpTo n` doesn't compute for n > 5. — FoxZ322, Aug 27 '20 at 13:50
I've tested all the variants I posted. here's a copy-paste from my GHCi session: https://pastebin.com/j9NJx8nb. it works. how are you testing it? -- the "unfaithful" is what that user posted. it uses `rem` for each candidate. my code *calculates* the composites by counting up in constant increments; not *tests* for them. — Will Ness, Aug 27 '20 at 14:33

Generating finite lists of primes in Haskell

3 Answers3

Linked