Haskell State monad vs state as parameter performance test

Question

I start to learn a State Monad and one idea bother me. Instead of passing accumulator as parameter, we can wrap everything to the state monad.

So I wanted to compare performance between using State monad vs passing it as parameter.

So I created two functions:

sum1 :: Int -> [Int] -> Int
sum1 x [] = x
sum1 x (y:xs) =  sum1 (x + y) xs

and

sumState:: [Int] -> Int
sumState xs = execState (traverse f xs) 0
    where f n = modify (n+)

I compared them on the input array [1..1000000000].

sumState running time was around 15s
sum1 around 5s

We can see clear winner, but the I realised that sumState can be optimised as:

We can use strict version of modify
We do not need necessary the map list output, so we can use traverse_ instead

So the new optimised state function is:

sumState:: [Int] -> Int
sumState xs = execState (traverse_ f xs) 0
    where f n = modify' (n+)

which has running time around 350ms. This is a huge improvement. It was shocking.

Why the modified sumState has better performance then sum1? Can sum1 be optimised to match or even be better then sumState?

I also tried other different implementation of sum as

using built in sum function, which gives me around 240ms ((sum [1..x] ::Int))
using strict foldl', which gives me the same result around 240ms (with implicit [Int] -> Int)

Does it actually mean that it is better to use foldl function or State monad to pass accumulator instead of passing it as argument to the function?

Thank you for help.

EDIT:

Each function was in separate file with own main function and compiled with "-O2" flag.

main = do
    x <- (read . head ) <$> getArgs
    print $ <particular sum function> [1..x]

Runtime was measured via time command on linux.

How did you measure those times? Did you compile them? Did you profile them in release mode? Did you use a benchmarking suite like `criterion`? — Zeta, May 22 '21 at 13:36
I complied them using O2 via ghc and using built in time function in linux — lukas kiss, May 22 '21 at 13:37
By the way, you have multiple questions in a single post. You should focus on a single one. — Zeta, May 22 '21 at 13:38
Hm, `sum` and `sum1` have the same 5s on my machine. Are you sure that you've used the same arguments for your tests? I only get ~240ms if I remove the last two zeroes. — Zeta, May 22 '21 at 13:51
for sum I used "(sum [1..x] ::Int)" maybe that is the reason. Hae you tried to also run the sumState? — lukas kiss, May 22 '21 at 13:54
the time going from 15s to 350ms is a "huge increase"? you mean improvement? and then what, you wanted to also improve the speed of your `sum1` which currently takes 5s? is that it? please clarify. (if so, then adding strictness should help, like `sum1 !x [] = x ; sum1 !x (y:xs) = ...`). — Will Ness, May 22 '21 at 15:34
@lukaskiss Unfortunately no. I don't have access to my Haskell machine at the moment, and Stack on Debian@WSL turned out to break horribly, so I couldn't load any external dependencies. Sorry :( — Zeta, May 23 '21 at 05:26

Noughtmare · Accepted Answer · 2021-05-22T16:19:16.663

To give a bit more explanation as to why traverse is slower: traverse f xs has has type State [()] and that [()] (list of unit tuples) is built up during the summation. This prevents further optimizations and would cause a memory leak if you were not using lazy state.

Update: I think GHC should have been able to notice that that list of unit tuples is never used, so I opened a GHC issue.

In both cases, To get the best performance we want to combine (or fuse) the summation with the enumeration [1..x] into a tight recursive loop which simply increments and adds until it reaches x. The resulting code would look something like this:

sumFromTo :: Int -> Int -> Int -> Int
sumFromTo s x y
  | x == y = s + x
  | otherwise = sumFromTo (s + x) (x + 1) y

This avoids allocations for the list [1..x].

The base library achieves this optimization using foldr/build fusion, also known as short cut fusion. The sum, foldl' and traverse (for lists) functions are implemented using the foldr function and [1..x] is implemented using the build function. The foldr and build function have special optimization rules so that they can be fused. Your custom sum1 function doesn't use foldr and so it can never be fused with [1..x] in this way.

score 0 · Answer 2 · answered May 25 '21 at 19:55

Ironically, the same problem that plagued your implementation of sumState is also the problem with sum1. You don't have strict accumulation, so you build up thunks like so:

sum 0 [1, 2, 3]
sum (0 + 1) [2, 3]
sum ((0 + 1) + 2) [3]
sum (((0 + 1) + 2) + 3) []
(((0 + 1) + 2) + 3)
((1 + 2) + 3)
(3 + 3)
6

If you add strictness to sum1, you should see a dramatic improvement in efficiency because you eliminate the non-tail-recursive evaluation of the thunk (((0 + 1) + 2) + 3), which is the costly part of sum1. Using strict accumulation makes this much more efficient:

sum1 x [] = []
sum1 x (y : xs) = x `seq` sum1 (x + y) xs

should give you comparable performance to sum (although as noted in another answer, GHC may not be able to use fusion properly to give you the truly magical performance of sum on the list [1..x]).

Haskell State monad vs state as parameter performance test

2 Answers2