6

I'm solving some classic problems in Haskell to develop my functional skills and I have a problem to implement an optimization suggested at this "Programming Praxis" site:

I have three solutions to this problem and the third one is too slow compared to the second solution. Can someone suggest some improvements to my code?

My implementations are:

-- primeira implementação
primes n
    | n < 2 = []
    | n == 2 = [2]
    | n `mod` 2 == 0 = primes'
    | otherwise = if (find (\x -> n `mod` x == 0) primes') == Nothing then
                      n:primes'
                  else
                      primes'
    where primes' = primes (n - 1)

-- segunda implementação
primes' :: Integer -> [Integer]
primes' n = sieve $ 2 : [3,5..n]
    where sieve :: [Integer] -> [Integer]
          sieve [] = []
          sieve l@(x:xs)
              | x*x >= n = l
              | otherwise = x : sieve list'
              where list' = filter (\y -> y `mod` x /= 0) xs

-- terceira implementação
primes'' :: Integer -> [Integer]
primes'' n = 2 : sieve 3 [3,5..n]
    where sieve :: Integer -> [Integer] -> [Integer]
          sieve _ [] = []
          sieve m l@(x:xs)
              | m*m >= n = l
              | x < m*m = x : sieve m xs
              | otherwise = sieve (m + 2) list'
              where list'= filter (\y -> y `mod` m /= 0) l
Will Ness
  • 70,110
  • 9
  • 98
  • 181
  • 1) Your English is fine. 2) I really like that you attached runnable code - so few questions do that lately. 3) You could also include your compile command - some people forget -O2 (and once upon a time an experimental -O3 existed and slowed down many programs but new-comers didn't know that). – Thomas M. DuBuisson Oct 04 '10 at 04:25
  • 10
    You may be interested in [The Genuine Sieve of Eratosthenes](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) paper. It covers several different implementations (including a fake sieve) and talks about basic performance characteristics as well as offering some ways of speeding up the algorithm. –  Oct 04 '10 at 05:59
  • 1
    One of the important messages from the paper “The Genuine Sieve of Eratosthenes” is that, although the codes like these three are often provided as “implementations of the sieve of Eratosthenes,” their performance is closer to that of the trial division algorithm (slow) than to that of the sieve of Eratosthenes (fast). – Tsuyoshi Ito Oct 04 '10 at 13:32
  • As others have implied, your three codes are not The Sieve of Eratosthenes but rather variations of Trial Division (the modulo `mod` operator causes a division). As to why the third code is so slow, it is double culling by all odd numbers rather than just by the previously found primes as in the second code. For variations of actual Sieve of Eratothenese implementations, see the [Haskell Primes Page](http://www.haskell.org/haskellwiki/Prime_numbers), with the solutions based on the STUArray being much faster although not purely functional (modifying array state inside the ST monad). – GordonBGood Feb 04 '14 at 00:52
  • surely you meant the first variant is "too slow" as it is quadratic (or worse), in *k* primes produced. the second is kind of an abortive sieve of Turner (i.e. sieve by trial division) by primes (which is around *~ k^1.5* give or take a *log* factor), and the third - the same but by odd numbers (which should make it worse than the second by another *log* factor). This can be assessed [empirically](http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth), too. – Will Ness Dec 25 '18 at 11:58

3 Answers3

6

First of all, mod is slow so use rem in situations where it doesn't matter (when you aren't dealing with negatives, basically). Secondly, use Criterion to show (to yourself) what is faster and what changes are actually optimizations. I know I'm not giving a full answer to you question with this, but its a good place for you (and other potential answerers) to start, so here's some code:

import List
import Criterion.Main

main = do
  str <- getLine
  let run f = length . f
      input = read str :: Integer
  defaultMain   [ bench "primes" (nf (run primes) input)
                , bench "primes'" (nf (run primes') input)
                , bench "primes''" (nf (run primes'') input)
                , bench "primesTMD" (nf (run primesTMD) input)
                , bench "primes'TMD" (nf (run primes'TMD) input)
                , bench "primes''TMD" (nf (run primes''TMD) input)
                ]
  putStrLn . show . length . primes'' $ (read str :: Integer)

-- primeira implementação
primes n
    | n < 2 = []
    | n == 2 = [2]
    | n `mod` 2 == 0 = primes'
    | otherwise = if (find (\x -> n `mod` x == 0) primes') == Nothing then
                      n:primes'
                  else
                      primes'
    where primes' = primes (n - 1)

primesTMD n
    | n < 2 = []
    | n == 2 = [2]
    | n `mod` 2 == 0 = primes'
    | otherwise = if (find (\x -> n `rem` x == 0) primes') == Nothing then
                      n:primes'
                  else
                      primes'
    where primes' = primesTMD (n - 1)

-- segunda implementação
primes' :: Integer -> [Integer]
primes' n = sieve $ 2 : [3,5..n]
    where sieve :: [Integer] -> [Integer]
          sieve [] = []
          sieve l@(x:xs)
              | x*x >= n = l
              | otherwise = x : sieve list'
              where list' = filter (\y -> y `mod` x /= 0) xs

primes'TMD :: Integer -> [Integer]
primes'TMD n = sieve $ 2 : [3,5..n]
    where sieve :: [Integer] -> [Integer]
          sieve [] = []
          sieve l@(x:xs)
              | x*x >= n = l
              | otherwise = x : sieve list'
              where list' = filter (\y -> y `rem` x /= 0) xs

-- terceira implementação
primes'' :: Integer -> [Integer]
primes'' n = 2 : sieve 3 [3,5..n]
    where sieve :: Integer -> [Integer] -> [Integer]
          sieve _ [] = []
          sieve m l@(x:xs)
              | m*m >= n = l
              | x < m*m = x : sieve m xs
              | otherwise = sieve (m + 2) list'
              where list'= filter (\y -> y `mod` m /= 0) l

primes''TMD :: Integer -> [Integer]
primes''TMD n = 2 : sieve 3 [3,5..n]
    where sieve :: Integer -> [Integer] -> [Integer]
          sieve _ [] = []
          sieve m l@(x:xs)
              | m*m >= n = l
              | x < m*m = x : sieve m xs
              | otherwise = sieve (m + 2) list'
              where list'= filter (\y -> y `rem` m /= 0) l

Notice the improved runtime of the variants using rem:

 $ ghc --make -O2 sieve.hs
 $./sieve
 5000
 ...
 benchmarking primes 
 mean: 23.88546 ms, lb 23.84035 ms, ub 23.95000 ms

 benchmarking primes'
 mean: 775.9981 us, lb 775.4639 us, ub 776.7081 us

 benchmarking primes''
 mean: 837.7901 us, lb 836.7824 us, ub 839.0260 us

 benchmarking primesTMD
 mean: 16.15421 ms, lb 16.11955 ms, ub 16.19202 ms

 benchmarking primes'TMD
 mean: 568.9857 us, lb 568.5819 us, ub 569.4641 us

 benchmarking primes''TMD
 mean: 642.5665 us, lb 642.0495 us, ub 643.4105 us

While I see you are doing this for your own education, its worth noting the related links of Primes on Haskell.org and the fast Primes package on hackage.

Thomas M. DuBuisson
  • 64,245
  • 7
  • 109
  • 166
  • Do you mind if I use one of these implementations in an open source project? – Rob Oct 29 '13 at 00:25
  • @robjb That code came from jahnke, not me. If you need primes in haskell and want a liberal license then see the [primes package](http://hackage.haskell.org/package/primes), which is licensed BSD3. – Thomas M. DuBuisson Oct 29 '13 at 16:25
6

Looks to me like the problem with your third revision is how you choose the next element to sift on. You indiscriminately increment by 2. The problem is that you then sift on unnecessary numbers. for example, in this version your eventually going to pass 9 as m, and you're going to do an extra recursion to filter on 9, even though it isn't even in the list, and thus you should have never picked it in the first place (since it would have been removed in the very first filter on 3)

Even though the second version doesn't start the filtering past the square of the number it sifts on, it never chooses an unnecessary sifting value.

In other words, I think you end up sifting on every odd number between 3 and n. Instead you should be sifting on every odd number that hasn't already been removed by a previous pass.

I think to correctly implement the optimization of starting the sieve at the square of the current sift value, you have to retain the front of the list while sifting on the back where back contains the elements >= the square of the sift value. I think this would force you to use concatenations, and I'm not so sure that the optimization is good enough to cancel out the overhead induced by using ++.

mayahustle
  • 86
  • 2
  • *"sifting on every odd number that hasn't already been removed by a previous pass"* IOW every odd prime. thus, to fix `primes''`, we need to pass the result of `tail $ primes'' (ceiling . sqrt . fromIntegral $ n)` as the first argument to `sieve` and advance on it by repeated `tail`s instead of `(+ 2)`. then it should become faster than `primes'` because unlike it, `primes''` properly postpones the filtering until the square of the number to filter by. – Will Ness Dec 25 '18 at 14:43
1

This is not optimized but expressive implementation: check video Sieve of Eratosthenes in haskell

import qualified Data.Set as Set(fromList,difference)
kr n l = (*n) <$> [2..l `div` n]
g n = difference (fromList [2..n]) (fromList $ concat $ ((flip kr) n) <$> [2..n])
Evg
  • 2,978
  • 5
  • 43
  • 58
  • 1
    this is all very cryptic. `g lim = difference (fromList [2..lim]) (fromList [ m | n <- [2..lim], m <- [n+n, n+n+n .. lim]])` is the same thing, but is more readily apparent what it does. :) – Will Ness Oct 20 '20 at 05:35
  • 1
    btw it is a nice approach. we can play with it [a little](https://pastebin.com/erdxDtZX), too, to make it more efficient. – Will Ness Oct 20 '20 at 22:17
  • 1
    so, `g lim | lim < 2 = [] | otherwise = difference (fromList $ 2 : [3,5..lim]) (fromList [ m | p <- drop 1 . g . floor . sqrt . fromIntegral $ lim), m <- [p*p, p*p+2*p .. lim]])`. looks nice. :) – Will Ness Oct 20 '20 at 22:27
  • @WillNess thanx for improvment! you are awesome :) – Evg Oct 20 '20 at 23:14