Performance of State in a certain case is slower than I would expect. Why?

Question

Consider the function reverseAndMinimum (slightly modified from another answer nearby):

import Control.Monad.State.Strict

reverseAndMinimum' :: Ord a => [a] -> State a [a] -> State a [a]
reverseAndMinimum' [ ] res = res
reverseAndMinimum' (x:xs) res = do
        smallestSoFar <- get
        when (x < smallestSoFar) (put x)
        reverseAndMinimum' xs ((x:) <$> res)

reverseAndMinimum :: Ord a => [a] -> ([a], a)
reverseAndMinimum [ ] = error "StateSort.reverseAndMinimum: This branch is unreachable."
reverseAndMinimum xs@(x:_) = runState (reverseAndMinimum' xs (return [ ])) x

It traverses the argument only once; however, it is about 30% slower than a naive function that does it two times:

reverseAndMinimum_naive :: Ord a => [a] -> ([a], a)
reverseAndMinimum_naive xs = (reverse xs, minimum xs)

It also consumes about 57% as much memory.

Here are the relevant extracts from a run with +RTS -s:

reverseAndMinimum

     176,672,280 bytes allocated in the heap
...

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.064s  (  0.063s elapsed)
  GC      time    0.311s  (  0.311s elapsed)
  EXIT    time    0.005s  (  0.005s elapsed)
  Total   time    0.379s  (  0.380s elapsed)

  %GC     time      82.0%  (82.0% elapsed)

reverseAndMinimum_naive

     112,058,976 bytes allocated in the heap
...

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.041s  (  0.040s elapsed)
  GC      time    0.245s  (  0.245s elapsed)
  EXIT    time    0.005s  (  0.005s elapsed)
  Total   time    0.291s  (  0.291s elapsed)

  %GC     time      84.2%  (84.3% elapsed)

What's happening, how can I diagnose, is it possible to improve?

P.S. A handy main for running tests:

main = do
    top <- (read :: String -> Int) . (!! 0) <$> getArgs
    val <- evaluate . force $ reverseAndMinimum (take top [top, top - 1.. 1 :: Int])
    print $ (\x -> (last . fst $ x, snd x)) $ val

Why is `reverseAndMinimum'` listed? Is it used at all? Also, what input did you use for the tests? — Mark Seemann, Mar 08 '18 at 11:02
@MarkSeemann I lost an apostrophe. Will fix shortly. I used ascending and descending lists of length about 10^5. — Ignat Insarov, Mar 08 '18 at 11:10
How are you running it? Is it compiled with optimizations? (benchmarking with no optimizations is kind of meaningless since inlining is a big part of Haskell performance.) Are you using strict or lazy state? Strict gets much fewer allocations, though it's still slower than the naive version. — Li-yao Xia, Mar 08 '18 at 11:22
@Li-yaoXia Yes, it is compiled with optimizations. I used lazy state; the strict takes exactly 2/3 as much space, which is great − but the processor cycles spent actually doing work appear the same. — Ignat Insarov, Mar 08 '18 at 11:33
I was confused because I didn't get measurements in the same order of magnitude (but the same relative ratio). I get the 300MB number with lists of length 10^(6). Also, `[1, 1..]` is a constant list. — Li-yao Xia, Mar 08 '18 at 12:24
@Li-yaoXia Yes, it's constant, but I don't think it matters. I tried all kinds of lists. The proportion of values that are lower than the current minimum must be determining how often the state gets changed, but the impact of this seems to be minor: ascending and descending lists give very similar performance. — Ignat Insarov, Mar 08 '18 at 12:33

Li-yao Xia · Accepted Answer · 2018-03-08T13:34:27.423

6

EDIT: this refers to a previous version of the question where reverseAndMinimum :: Ord a => [a] -> State a ([a] -> [a]).

In the naive version, reverse is efficient because it can build up the reversed list directly while traversing the list, and minimum is ignored because it's not needed.

The "one pass" reverseAndMinimum allocates a difference list, that must be applied to yield an actual list which is then traversed again to find its last element.

Replicating the accumulator technique used in reverse, the following code compiles to a tight loop to compute the reverse and minimum of a list in one pass.

import Control.Monad.State.Strict

reverseAndMinimum :: Ord a => [a] -> ([a], a)
reverseAndMinimum [ ] = error "Empty list!"
reverseAndMinimum (x:xs) = runState (reverseAndMinimum' xs [x]) x

reverseAndMinimum' :: Ord a => [a] -> [a] -> State a [a]
reverseAndMinimum' [ ] acc = return acc
reverseAndMinimum' (x:xs) acc = do
    smallestSoFar <- get
    when (x < smallestSoFar) (put $ x)
    reverseAndMinimum' xs (x : acc)

edited Mar 08 '18 at 13:34

answered Mar 08 '18 at 12:15

Li-yao Xia

31,896
2
33
56

I went on and got rid of dlist, borrowing from [`Prelude.reverse`](http://hackage.haskell.org/package/base-4.10.0.0/docs/src/GHC.List.html#reverse) instead. It still runs slower than the naive implementation by about a third. Even if I print the minimum. (I actually `evaluate . force` everything now.) Can you produce code that achieves speedup? – Ignat Insarov Mar 08 '18 at 13:09
I updated the post with, hopefully, more precise measurements, and the new code. Kindly take a look. – Ignat Insarov Mar 08 '18 at 13:23
You almost got it! The accumulator needed to just be a list. I updated my answer. – Li-yao Xia Mar 08 '18 at 13:35
1

Oh! This is actually **faster** than the naive variant **by** almost **23% !** – Ignat Insarov Mar 08 '18 at 13:48
3

BTW there is a library for single-pass folds: [foldl](https://hackage.haskell.org/package/foldl). – Li-yao Xia Mar 08 '18 at 13:54

Performance of State in a certain case is slower than I would expect. Why?

1 Answers1

Linked