8

I have a very large decision tree. It is used as follows:

-- once per application start
t :: Tree
t = buildDecisionTree
-- done several times
makeDecision :: Something -> Decision
makeDecision something = search t something

This decision tree is way too large to fit in memory. But, thanks to lazy evaluation, it is only partially evaluated.

The problem is, that there are scenarios where all possible decisions are tried causing the whole tree to be evaluated. This is not going to terminate, but should not cause a memory overflow either. Further, if this process is aborted, the memory usage does not decrease, as a huge subtree is still evaluated already.

A solution would be to reevaluate the tree every time makeDecision is called, but this would loose the benefits of caching decisions and significantly slow down makeDecision.

I would like to go a middle course. In particular it is very common in my application to do successive decisions with common path prefix in the tree. So I would like to cache the last used path but drop the others, causing them to reevaluate the next time they are used. How can I do this in Haskell?

ipsec
  • 680
  • 4
  • 11
  • 2
    Related: http://stackoverflow.com/questions/11675807/can-a-thunk-be-duplicated-to-improve-memory-performance – shang Jan 18 '13 at 09:23
  • 1
    That is an interesting trick @shang thanks for sharing. – Davorak Jan 18 '13 at 11:39
  • @ipsec I would be surprised if there is an answer that does not put you in a pure monad or the IO monad. You might be able to get away with a unsafePreformIO since the interface should be pure. Would something along those lines work for you? – Davorak Jan 18 '13 at 14:16
  • I could even live with plain IO, if necessary. I could think of some tricks with IORefs. How would you implement this? – ipsec Jan 18 '13 at 16:02

1 Answers1

6

It is not possible in pure haskell, see question Can a thunk be duplicated to improve memory performance? (as pointed out by @shang). You can, however, do this with IO.

We start with the module heade and list only the type and the functions that should make this module (which will use unsafePerformIO) safe. It is also possible to do this without unsafePerformIO, but that would mean that the user has to keep more of his code in IO.

{-# LANGUAGE ExistentialQuantification #-}
module ReEval (ReEval, newReEval, readReEval, resetReEval) where

import Data.IORef
import System.IO.Unsafe

We start by defining a data type that stores a value in a way that prevents all sharing, by keeping the function and the argument away from each other, and only apply the function when we want the value. Note that the value returned by unsharedValue can be shared, but not with the return value of other invocations (assuming the function is doing something non-trivial):

data Unshared a = forall b. Unshared (b -> a) b

unsharedValue :: Unshared a -> a
unsharedValue (Unshared f x) = f x

Now we define our data type of resettable computations. We need to store the computation and the current value. The latter is stored in an IORef, as we want to be able to reset it.

data ReEval a = ReEval {
    calculation :: Unshared a,
    currentValue :: IORef a
    }

To wrap a value in a ReEval box, we need to have a function and an argument. Why not just a -> ReEval a? Because then there would be no way to prevent the parameter to be shared.

newReEval :: (b -> a) -> b -> ReEval a
newReEval f x = unsafePerformIO $ do
    let c = Unshared f x
    ref <- newIORef (unsharedValue c)
    return $ ReEval c ref

Reading is simple: Just get the value from the IORef. This use of unsafePerformIO is safe becuase we will always get the value of unsharedValue c, although a different “copy” of it.

readReEval :: ReEval a -> a
readReEval r = unsafePerformIO $ readIORef (currentValue r)

And finally the resetting. I left it in the IO monad, not because it would be any less safe than the other function to be wrapped in unsafePerformIO, but because this is the easiest way to give the user control over when the resetting actually happens. You don’t want to risk that all your calls to resetReEval are lazily delayed until your memory has run out or even optimized away because there is no return value to use.

resetReEval :: ReEval a -> IO ()
resetReEval r = writeIORef (currentValue r) (unsharedValue (calculation r))

This is the end of the module. Here is example code:

import Debug.Trace
import ReEval
main = do
    let func a = trace ("func " ++ show a) negate a
    let l = [ newReEval func n | n <- [1..5] ]
    print (map readReEval l)
    print (map readReEval l)
    mapM_ resetReEval l
    print (map readReEval l)

And here you can see that it does what expected:

$ runhaskell test.hs 
func 1
func 2
func 3
func 4
func 5
[-1,-2,-3,-4,-5]
[-1,-2,-3,-4,-5]
func 1
func 2
func 3
func 4
func 5
[-1,-2,-3,-4,-5]
Community
  • 1
  • 1
Joachim Breitner
  • 25,395
  • 6
  • 78
  • 139
  • I tried this and it worked like a charm. Unfortunately it required a lot of code changes, but I also guess that this is impossible in pure Haskell. Anyway, my problem is solved. Thank you! – ipsec Jan 19 '13 at 14:30
  • I actually believe that there is a variant of that idea without IO, but where you’d have to map a function over `l` to get a new `l` with sharing removed, but it might be tricky to use evaluation-wise. – Joachim Breitner Jan 19 '13 at 15:41