17

To help me learn Haskell, I am working through the problems on Project Euler. After solving each problem, I check my solution against the Haskell wiki in an attempt to learn better coding practices. Here is the solution to problem 3:

primes = 2 : filter ((==1) . length . primeFactors) [3,5..]

primeFactors n = factor n primes
  where
    factor n (p:ps) 
        | p*p > n        = [n]
        | n `mod` p == 0 = p : factor (n `div` p) (p:ps)
        | otherwise      = factor n ps

problem_3 = last (primeFactors 317584931803)

My naive reading of this is that primes is defined in terms of primeFactors, which is defined in terms of primes. So evaluating primeFactors 9 would follow this process:

  1. Evaluate factor 9 primes.
  2. Ask primes for its first element, which is 2.
  3. Ask primes for its next element.
  4. As part of this process, evaluate primeFactors 3.
  5. Ask primes for its first element, which is 2.
  6. Ask primes for its next element.
  7. As part of this process, evaluate primeFactors 3.
  8. ...

In other words, steps 2-4 would repeat infinitely. Clearly I am mistaken, as the algorithm terminates. What mistake am I making here?

Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
Matthew
  • 28,056
  • 26
  • 104
  • 170
  • 1
    because, as the answers here say, `primeFactors` only accesses `primes` until the square of a prime exceeds the number being tested, that code is equivalent to `primes = 2:[n | n<-[3..], all ((> 0).rem n) $ takeWhile ((<= n).(^2)) primes]` which is clearly non-looping. – Will Ness Jun 23 '12 at 07:13

3 Answers3

17

primeFactors only ever reads up to the square root of the number it's evaluating. It never looks further in the list, which means it never "catches up" to the number it's testing for inclusion in the list. Because Haskell is lazy, this means that the primeFactors test does terminate.

The other thing to remember is that primes isn't a function that evaluates to a list each time you access it, but rather a list that's constructed lazily. So once the 15th element has been accessed once, accessing it a second time is "free" (e.g. it doesn't require any further calculation).

Lily Ballard
  • 182,031
  • 33
  • 381
  • 347
  • detail: Accessing it a second time still costs 15 dereferencings since we're doing cons cell lists... this can be a lot if you have hundreds of list elements – amara Jan 29 '12 at 18:43
  • 1
    @sparkleshy which would be quadratic in (approximately) `sqrt(n)`, i.e. add (approximately) linear cost to an *above-linear* calculation. – Will Ness Jun 20 '12 at 07:17
8

Kevin's answer is satisfactory, but allow me to pinpoint the flaw in your logic. It is #6 that is wrong. So we're evaluating primeFactors 3:

primeFactors 3          ==>
factor 3 primes         ==>
factor 3 (2 : THUNK)    ==>
2*2 > 3 == True         ==>
[3]

The THUNK need never be evaluated to determine that the primeFactor 3 is [3].

Dan Burton
  • 53,238
  • 27
  • 117
  • 198
7

primeFactors 3 doesn't ask primes for its next element, only the first one, because 2*2 is greater than 3 already

asvyazin
  • 151
  • 2