How to Avoid Lifted Values in Worker of Tuple-Valued Recursive Function?

Question

I have the following minimal example:

import Test.Tasty.Bench

{-# INLINE loop #-}
loop :: Int -> Int -> Int
loop a 0 = a
loop a n = loop (a + x + y) (n - 1) where (x, y) = foo n

{-# INLINE foo #-}
foo :: Int -> (Int, Int)
foo n = if n > 0 then (n + 1, n) else foo (n + 1)

main :: IO ()
main = defaultMain [bench "test" $ whnf (loop 0) (1024 * 1024)]

Running it gives me:

test: OK (0.24s)
  1.88 ms ±  88 μs,  32 MB allocated, 2.8 KB copied, 2.0 MB peak memory

I want to avoid the 32 MB heap allocation. Looking at the core dump, I find the following worker functions:

Rec {
-- RHS size: {terms: 18, types: 6, coercions: 0, joins: 0/0}
$wfoo
  = \ ww ->
      case ># ww 0# of {
        __DEFAULT -> $wfoo (+# ww 1#);
        1# -> (# I# (+# ww 1#), I# ww #)
      }
end Rec }

Rec {
-- RHS size: {terms: 26, types: 14, coercions: 0, joins: 0/0}
$wloop
  = \ ww ww1 ->
      case ww1 of ds {
        __DEFAULT ->
          case $wfoo ds of { (# ww3, ww4 #) ->
          case ww3 of { I# y ->
          case ww4 of { I# y1 -> $wloop (+# (+# ww y) y1) (-# ds 1#) }
          }
          };
        0# -> ww
      }
end Rec }

I am pretty sure that the heap allocation is caused by $wfoo returning lifted Int values. I have tried various strictness annotations to coax GHC into generating a worker function that returns unlifted values, but without success. For instance, the following causes no changes in the core dump other than renaming bound variables:

loop a n = seq x $ seq y $ loop (a + x + y) (n - 1) where (x, y) = foo n

If I drop the second component of the tuple and have foo only return a single Int, GHC immediately removes all the lifted Int values and the resulting program uses no heap allocation.

I have also been able to avoid the lifted Int values by using a datatype with strict fields like data Pair = Pair !Int !Int (although I have not been able to do this with a polymorphic strict pair). Curiously enough, this datatype does not even appear in the core, which uses only unboxed tuples of unlifted values.

In my production code, I need the function to work with tuples, so these solutions do not work for me. Since the datatype with strict fields ends up getting erased anyways, it seems to me that it only serves as a convoluted way of making strictness annotations. I assume that there is a more direct way to make those same strictness annotations that get GHC to generate a worker function that returns unlifted values.

How can I make GHC generate a worker function for foo that returns unlifted values and thus avoid the heap allocation associated with lifted values?

I am using GHC 8.10.7 and LLVM 13.0.1. Built with -O2 -fllvm -optlo-O3, run with +RTS -T.

What version of GHC, and what's your build/run look like? I can't repro this with ghc 9.4.2, a trivial cabal file that depends only on `base` and `tasty-bench`, and run with `cabal run all -- +RTS -T`. It reports 0 allocation basically no matter what I do to your minimal example. — Carl, Oct 16 '22 at 18:00
@Carl Ah, I should have really included that information, I added it to the question just now. I did not imagine that something this basic would depend on the GHC version. However, I just tried with GHC 9.4.2 and indeed, the problem does not arise there. I tried every version between 8.10.7 and 9.4.2 that is on GHCup, and it seems the behavior went away somewhere between 9.2.4 and 9.4.2. Unfortunately, 9.4.2 introduces other performance regressions in my project, so I am not sure what to do now. :/ — Jules, Oct 16 '22 at 18:56
I can't seem to find a way to coax GHC 8.10.7 to unbox things on its own in this pattern. The most I can suggest is finding a way to refactor things so that you can have a manually unboxed worker with a wrapper that does boxing as expected by the rest of the application. — Carl, Oct 16 '22 at 20:09
Why do you need the production code to work with tuples? Would it be enough to CPS it, so that callers that want tuples can pass `(,)` as the first argument? Alternately, would it be enough to CPS it as a(n exported) helper, and offer `foo = fooHelper (,)` for backwards compatibility? — Daniel Wagner, Oct 16 '22 at 22:28
@DanielWagner I think my brain was pretty dead when I posted this. I am using the tuple-valued function in a StateT newtype, but you are right, I could just have an intermediate function with the strictness-enforcing datatype. However, I think rather than spending my time on workarounds for compiler shortcomings, I will spend it on migrating to GHC 9.4.2, which I want to switch to eventually anyways. — Jules, Oct 24 '22 at 12:43

How to Avoid Lifted Values in Worker of Tuple-Valued Recursive Function?

0 Answers0