Optimizing partial computation in Haskell

Question

I'm curious how to optimize this code :

fun n = (sum l, f $ f0 l, g $ g0 l)
  where l = map h [1..n]

Assuming that f, f0, g, g0, and h are all costly, but the creation and storage of l is extremely expensive.

As written, l is stored until the returned tuple is fully evaluated or garbage collected. Instead, length l, f0 l, and g0 l should all be executed whenever any one of them is executed, but f and g should be delayed.

It appears this behavior could be fixed by writing :

fun n = a `seq` b `seq` c `seq` (a, f b, g c)
  where
    l = map h [1..n]
    a = sum l
    b = inline f0 $ l
    c = inline g0 $ l

Or the very similar :

fun n = (a,b,c) `deepSeq` (a, f b, g c)
  where ...

We could perhaps specify a bunch of internal types to achieve the same effects as well, which looks painful. Are there any other options?

Also, I'm obviously hoping with my inlines that the compiler fuses sum, f0, and g0 into a single loop that constructs and consumes l term by term. I could make this explicit through manual inlining, but that'd suck. Are there ways to explicitly prevent the list l from ever being created and/or compel inlining? Pragmas that produce warnings or errors if inlining or fusion fail during compilation perhaps?

As an aside, I'm curious about why seq, inline, lazy, etc. are all defined to by let x = x in x in the Prelude. Is this simply to give them a definition for the compiler to override?

In response to the last question: http://stackoverflow.com/a/8654407/1011995 — Daniel Fischer, Apr 22 '12 at 10:02
Are `f0` and `g0` entirely arbitrary, or can they be written in terms of `foldr`? — dave4420, Apr 22 '12 at 10:38
Wouldn't a simple fold with an (a,b,c)-accumulator suffice here? — Sarah, Apr 22 '12 at 12:19
My answer at http://stackoverflow.com/questions/9409138/haskell-scan-through-a-list-and-apply-a-different-function-for-each-element/9411461#9411461 addresses exactly this. — John L, Apr 22 '12 at 14:41
Ahh yes, you cannot do much without `f0` and `g0` taking the right form. I'd vaguely envisioned them both iterating over `l` while editing a vector that summarizes some information from `l`, probably both defined in terms of `Data.Vector.accum`, not sure `Data.Vector` can fuse two `accum` calls. — Jeff Burdges, Apr 22 '12 at 22:56

score 3 · Answer 1 · answered Apr 22 '12 at 12:41

If you want to be sure, the only way is to do it yourself. For any given compiler version, you can try out several source-formulations and check the generated core/assembly/llvm byte-code/whatever whether it does what you want. But that could break with each new compiler version.

If you write

fun n = a `seq` b `seq` c `seq` (a, f b, g c)
  where
    l = map h [1..n]
    a = sum l
    b = inline f0 $ l
    c = inline g0 $ l

or the deepseq version thereof, the compiler might be able to merge the computations of a, b and c to be performed in parallel (not in the concurrency sense) during a single traversal of l, but for the time being, I'm rather convinced that GHC doesn't, and I'd be surprised if JHC or UHC did. And for that the structure of computing b and c needs to be simple enough.

The only way to obtain the desired result portably across compilers and compiler versions is to do it yourself. For the next few years, at least.

Depending on f0 and g0, it might be as simple as doing a strict left fold with appropriate accumulator type and combining function, like the famous average

data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Double

average :: [Double] -> Double
average = ratio . foldl' count (P 0 0)
  where
    ratio (P n s) = s / fromIntegral n
    count (P n s) x = P (n+1) (s+x)

but if the structure of f0 and/or g0 doesn't fit, say one's a left fold and the other a right fold, it may be impossible to do the computation in one traversal. In such cases, the choice is between recreating l and storing l. Storing l is easy to achieve with explicit sharing (where l = map h [1..n]), but recreating it may be difficult to achieve if the compiler does some common subexpression elimination (unfortunately, GHC does have a tendency to share lists of that form, even though it does little CSE). For GHC, the flags fno-cse and -fno-full-laziness can help avoiding unwanted sharing.

Ahh, interesting point about the left vs. right fold! I'm slightly confused about your CSE point though. Are you simply observing that CSE can create this problem when you try to code around it naively? — Jeff Burdges, Apr 22 '12 at 22:17
If it's cheaper to recreate the list than to store it, you would write e.g. `f0 (map h [1 .. n])` and `g0 (map h [1 .. n])`. But the compiler may eliminate the common subexpression `map h [1 .. n]` and share it between the computations. Preventing that, if it's undesirable, is not as straightforward as the opposite, sharing a subexpression (which is done if you bind it to a name, `where l = map h [1 .. n]`). Basically, yes, CSE can introduce that problem, and it may be harder to work around. — Daniel Fischer, Apr 22 '12 at 22:24

Optimizing partial computation in Haskell

1 Answers1