Note: This post is written in literate Haskell. Copy it into a file, save it as *.lhs, and compile/load with in GHC(i). Also, I started writing this answer before you've edited your code, but the lesson stays the same.
TL;DR
The Prelude
function uncurry
is too lazy, whereas your pattern match is just strict enough.
A word of caution and a disclaimer
We're entering a magical, weird place. Beware. Also, my CORE abilities are rudimentary. Now that I've lost all my credibility, let's get started.
The tested code
In order to know where we get the additional memory requirements, it's useful to have more than two functions.
> import Control.Monad (forM_)
This is your original, non-pointfree variant:
> matches :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matches f cs = foldr (\(cs', n) a -> fromEnum (f cs cs') * n + a) 0
This is a variant that's only slightly point-free, the parameter a
is eta-reduced.
> matchesPF' :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPF' f cs = foldr (\(cs', n) -> (+) (fromEnum (f cs cs') * n)) 0
This is a variant that inlines uncurry
by hand.
> matchesPFI :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPFI f cs = foldr ((+) . (\(cs', n) -> fromEnum (f cs cs') * n)) 0
This your pointfree version.
> matchesPF :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPF f cs = foldr ((+) . uncurry ((*) . fromEnum . f cs)) 0
This is a variant that uses a custom uncurry
, see below.
> matchesPFU :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPFU f cs = foldr ((+) . uncurryI ((*) . fromEnum . f cs)) 0
This is a variant that uses a custom lazy uncurry
, see below.
> matchesPFL :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPFL f cs = foldr ((+) . uncurryL ((*) . fromEnum . f cs)) 0
To test the functions easily, we use a list:
> funcs = [matches, matchesPF', matchesPF, matchesPFL, matchesPFU, matchesPFI]
Our self-written uncurry
:
> uncurryI :: (a -> b -> c) -> (a, b) -> c
> uncurryI f (a,b) = f a b
A lazier uncurry
:
> uncurryL :: (a -> b -> c) -> (a, b) -> c
> uncurryL f p = f (fst p) (snd p)
The lazy variant uncurryL
has the same semantics as the variant in Prelude
, e.g.
uncurry (\_ _ -> 0) undefined == 0 == uncurryL (\_ _ -> 0) undefined
whereas uncurryI
is strict in the pair's spine.
> main = do
> let f a b = a < b
> forM_ [1..10] $ \i ->
> forM_ funcs $ \m ->
> print $ m f i (zip (cycle [1..10]) [1..i*100000])
The list [1..i*100000]
depends on i
deliberately, so that we don't introduce a CAF and skew our allocation profile.
The desugared code
Before we delve into the profile, let's have a look at the desugared code of each function:
==================== Desugar (after optimization) ====================
Result size of Desugar (after optimization)
= {terms: 221, types: 419, coercions: 0}
uncurryL
uncurryL = \ @ a @ b @ c f p -> f (fst p) (snd p)
uncurryI
uncurryI = \ @ a @ b @ c f ds -> case ds of _ { (a, b) -> f a b }
-- uncurried inlined by hand
matchesPFI =
\ @ a f cs ->
foldr
$fFoldable[]
(. (+ $fNumInt)
(\ ds ->
case ds of _ { (cs', n) ->
* $fNumInt (fromEnum $fEnumBool (f cs cs')) n
}))
(I# 0)
-- lazy uncurry
matchesPFL =
\ @ a f cs ->
foldr
$fFoldable[]
(. (+ $fNumInt)
(uncurryL (. (* $fNumInt) (. (fromEnum $fEnumBool) (f cs)))))
(I# 0)
-- stricter uncurry
matchesPFU =
\ @ a f cs ->
foldr
$fFoldable[]
(. (+ $fNumInt)
(uncurryI (. (* $fNumInt) (. (fromEnum $fEnumBool) (f cs)))))
(I# 0)
-- normal uncurry
matchesPF =
\ @ a f cs ->
foldr
$fFoldable[]
(. (+ $fNumInt)
(uncurry (. (* $fNumInt) (. (fromEnum $fEnumBool) (f cs)))))
(I# 0)
-- eta-reduced a
matchesPF' =
\ @ a f cs ->
foldr
$fFoldable[]
(\ ds ->
case ds of _ { (cs', n) ->
+ $fNumInt (* $fNumInt (fromEnum $fEnumBool (f cs cs')) n)
})
(I# 0)
-- non-point-free
matches =
\ @ a f cs ->
foldr
$fFoldable[]
(\ ds a ->
case ds of _ { (cs', n) ->
+ $fNumInt (* $fNumInt (fromEnum $fEnumBool (f cs cs')) n) a
})
(I# 0)
So far, everything seems well. There's nothing surprising going on. Typeclass functions are replaced with their dictionary variants, e.g. foldr
becomes
foldr $fFoldable[]`, since we call it on a list.
The profile
Mon Jul 18 15:47 2016 Time and Allocation Profiling Report (Final)
Prof +RTS -s -p -RTS
total time = 1.45 secs (1446 ticks @ 1000 us, 1 processor)
total alloc = 1,144,197,200 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
matchesPF' Main 13.6 0.0
matchesPF Main 13.3 11.5
main.\.\ Main 11.8 76.9
main.f Main 10.9 0.0
uncurryL Main 9.5 11.5
matchesPFU Main 8.9 0.0
matchesPFI Main 7.3 0.0
matches Main 6.9 0.0
matchesPFL Main 6.3 0.0
uncurryI Main 5.3 0.0
matchesPF'.\ Main 2.6 0.0
matchesPFI.\ Main 2.0 0.0
matches.\ Main 1.5 0.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 44 0 0.0 0.0 100.0 100.0
main Main 89 0 0.0 0.0 100.0 100.0
main.\ Main 90 10 0.0 0.0 100.0 100.0
main.\.\ Main 92 60 11.8 76.9 100.0 100.0
funcs Main 93 0 0.0 0.0 88.2 23.1
matchesPFI Main 110 10 7.3 0.0 11.7 0.0
matchesPFI.\ Main 111 5500000 2.0 0.0 4.4 0.0
main.f Main 112 5500000 2.4 0.0 2.4 0.0
matchesPFU Main 107 10 8.9 0.0 15.3 0.0
uncurryI Main 108 5500000 5.3 0.0 6.4 0.0
main.f Main 109 5500000 1.1 0.0 1.1 0.0
matchesPFL Main 104 10 6.3 0.0 17.7 11.5
uncurryL Main 105 5500000 9.5 11.5 11.4 11.5
main.f Main 106 5500000 1.9 0.0 1.9 0.0
matchesPF Main 102 10 13.3 11.5 15.4 11.5
main.f Main 103 5500000 2.1 0.0 2.1 0.0
matchesPF' Main 99 10 13.6 0.0 17.2 0.0
matchesPF'.\ Main 100 5500000 2.6 0.0 3.6 0.0
main.f Main 101 5500000 1.0 0.0 1.0 0.0
matches Main 94 10 6.9 0.0 10.9 0.0
matches.\ Main 97 5500000 1.5 0.0 4.0 0.0
main.f Main 98 5500000 2.5 0.0 2.5 0.0
CAF Main 87 0 0.0 0.0 0.0 0.0
funcs Main 91 1 0.0 0.0 0.0 0.0
main Main 88 1 0.0 0.0 0.0 0.0
main.\ Main 95 0 0.0 0.0 0.0 0.0
main.\.\ Main 96 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 84 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 78 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 76 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.Text 75 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 59 0 0.0 0.0 0.0 0.0
Ignore the main\.\.
noise, it's just the list. However, there's one point that one should notice immediately: matchesPF
and uncurryL
use the same alloc%
:
matchesPF Main 13.3 11.5
uncurryL Main 9.5 11.5
Getting to the CORE
Now it's time to inspect the resulting CORE (ghc -ddump-simpl
). We'll notice that most of the functions have been transformed into worker wrappers, and they look more or less the same (-dsuppress-all -dsuppress-uniques
):
$wa5
$wa5 =
\ @ a1 w w1 w2 ->
letrec {
$wgo
$wgo =
\ w3 ->
case w3 of _ {
[] -> 0;
: y ys ->
case y of _ { (cs', n) ->
case $wgo ys of ww { __DEFAULT ->
case w w1 cs' of _ {
False -> case n of _ { I# y1 -> ww };
True -> case n of _ { I# y1 -> +# y1 ww }
}
}
}
}; } in
$wgo w2
This is your usual worker-wrapper. $wgo
takes a list, checks whether it's empty, is strict in the head (case y of _ { (cs', n) ->…
) and lazy in the recursive result $wgo ys of ww
.
All functions look the same. Well, all except matchesPF
(your variant)
-- matchesPF
$wa3 =
\ @ a1 w w1 w2 ->
letrec {
$wgo =
\ w3 ->
case w3 of _ {
[] -> 0;
: y ys ->
case $wgo ys of ww { __DEFAULT ->
case let {
x = case y of _ { (x1, ds) -> x1 } } in
case w w1 x of _ {
False ->
case y of _ { (ds, y1) -> case y1 of _ { I# y2 -> main13 } };
-- main13 is just #I 0
True -> case y of _ { (ds, y1) -> y1 }
}
of _ { I# x ->
+# x ww
}
}
}; } in
$wgo w2
and matchesPFL
(the variant that uses the lazy uncurryL
)
-- matchesPFL
$wa2
$wa2 =
\ @ a1 w w1 w2 ->
letrec {
$wgo =
\ w3 ->
case w3 of _ {
[] -> 0;
: y ys ->
case $wgo ys of ww { __DEFAULT ->
case snd y of ww1 { I# ww2 ->
case let {
x = fst y } in
case w w1 x of _ {
False -> main13;
True -> ww1
}
of _ { I# x ->
+# x ww
}
}
}
}; } in
$wgo w2
They are virtually the same. And both of them contain let
bindings. This will create a thunk and usually lead to worse space requirements.
The solution
I think the culprit at this point is clear. It is uncurry
. GHC wants to enforce the correct semantics of
uncurry (const (const 0)) undefined
However, this adds laziness and additional thunks. Your non-pointfree variant doesn't introduce that behaviour, since you pattern match on the pair:
foldr (\(cs', n) a -> …)
Still don't trust me? Use a lazy pattern match
foldr (\ ~(cs', n) a -> …)
and you will notice that matches
will behave the same as matchesPF
. So use slightly stricter variant of uncurry
. uncurryI
is enough to give the strictness analyzer a hint.
Note that pairs are notorious for this behaviour. RWH spents a whole chapter trying to optimize the behaviour of a single function where intermediate pairs lead to problems.