How to build an infinite tree with duplicate elimination via cache of weak pointers in Haskell

Question

The following code builds up an infinite tree, while at the same time creating a cache of all subtrees, such that no duplicate subtrees are created. The rationale for elimination of duplicate subtrees comes from the application to state trees of chess-like games: one can often end up in the same game state by just changing the order of two moves. As the game progresses, states that become inaccessible should not continue to take up memory. I thought I could solve that problem through the use of weak pointers. Unfortunately using weak pointers brings us into the IO Monad and this seems to have destroyed enough/all lazyness such that this code does not terminate any more.

My question is thus: Is it possible to efficiently generate a lazy (game state) tree without duplicate subtrees (and without leaking memory)?

{-# LANGUAGE RecursiveDo #-}

import Prelude hiding (lookup)
import Data.Map.Lazy (Map, empty, lookup, insert)
import Data.List (transpose)

import Control.Monad.State.Lazy (StateT(..))
import System.Mem.Weak
import System.Environment

type TreeCache = Map Integer (Weak NTree)

data Tree a = Tree a [Tree a]
type Node = (Integer, [Integer])
type NTree = Tree Node

getNode (Tree a _) = a
getVals = snd . getNode

makeTree :: Integer -> IO NTree
makeTree n = fst <$> runStateT (makeCachedTree n) empty

makeCachedTree :: Integer -> StateT TreeCache IO NTree
makeCachedTree n = StateT $ \s -> case lookup n s of
  Nothing -> runStateT (makeNewTree n) s -- makeNewTree n s                                                                                                                                   
  Just wt -> deRefWeak wt >>= \mt -> case mt of
    Nothing -> runStateT (makeNewTree n) s
    Just t -> return (t,s)

makeNewTree :: Integer -> StateT TreeCache IO NTree
makeNewTree n = StateT $ \s -> mdo
  wt <- mkWeak n t Nothing
  (ts, s') <- runStateT
              (mapM makeCachedTree $ children n)
              (insert n wt s)
  let t = Tree (n, values n $ map getVals ts) ts
  return (t, s')

children n = let bf = 10 in let hit = 2 in [bf*n..bf*n+bf+hit-1]

values n [] = repeat n
values n nss = n:maximum (transpose nss)

main = do
  args <- getArgs
  let n = read $ head args in
    do t <- makeTree n
       if length args == 1 then putStr $ show $ take (fromInteger n) $ getVals t else putStr "One argument only!!!"

I don't think weak pointers (hence `IO`) will be necessary. For example, `Data.Seq` goes to great lengths to maximize internal sharing using some clever code: https://hackage.haskell.org/package/containers-0.5.7.1/docs/src/Data.Sequence.html#applicativeTree — cdk, May 12 '16 at 16:27
@cdk, I cannot quite find where/how it does that. From the comment "Special note: the Identity specialization automatically does node sharing, reducing memory usage of the resulting tree to /O(log n)/", it seems to be implied that the code does not implement sharing explicitly, but that compiler optimizations for a specific specialization make it happen anyway (implying that for other cases it may not happen). Could you explain a bit more how Data.Seq does sharing and how that would help me? — hkBst, May 19 '16 at 05:59
@hkBst, that comment is misleading, and I'll try to edit it to clarify in the next version. The compiler specialization to `Identity` does nothing but improve constant factors. What it really means is that *when used with `Identity`* there's lots of sharing. — dfeuer, May 25 '16 at 14:44
I don't see what you're trying to do with weak pointers. If you change your game state to point at a child of the state tree root representing the selected move, then the other subtrees should become inaccessible and be dropped by the garbage collector. — dfeuer, May 25 '16 at 14:50
@dfeuer, but the cache will still contain a reference to the previous state, preventing garbage collection. — hkBst, May 25 '16 at 14:56
My bounty is about to expire... I would prefer to award it to even a partial answer or hint rather than letting it go to waste... — hkBst, May 30 '16 at 06:21
If the game of chess is your motivation, then a game subtree is fully determined by its starting position and you need to eliminate duplicate nodes rather than subtrees. — n. m. could be an AI, Oct 02 '17 at 21:17

score 0 · Answer 1 · answered Oct 02 '17 at 21:06

My question is thus: Is it possible to efficiently generate a lazy (game state) tree without duplicate subtrees (and without leaking memory)?

No. Essentially what you're trying to do is use transposition tables to memoize your tree search function (e.g. negascout). The problem is that games have an exponential number of states and maintaining a transposition table which remembers all the transpositions of the state space is infeasible. You just don't have that much memory.

For example, Chess has a state space compexity of 10^47. That's several orders of magnitude greater than all the computer memory available in the entire world. Of course, you can reduce this amount by not storing reflections (Chess has 8 reflection symmetries). Furthermore, many of these transpositions would be unreachable due to pruning of the game tree. However, the state space is so intractable that you would still never to able to store every transposition.

What most programs usually do is they use a transposition table of a fixed size and when two transpositions hash to the same value, they use a replacement scheme to decide which entry will be kept and which one will be discarded. This is a tradeoff because you are keeping the value which you think will be most efficient (i.e. most visited or closer to the root node) at the cost of having to traverse the other transposition. The point is that it's impossible to generate a game tree which has no duplicate subtrees unless you're from a sufficiently advanced alien civilization.

The point of doing it _lazily_ is that you can limit the amount of memory used by limiting traversal. The question is whether you can do it lazily and also _detect and eliminate duplicates dynamically_. — hkBst, Oct 15 '17 at 13:19
First, laziness doesn't magically reduce your memory requirements. Second, most game tree search algorithms are depth-first searches which only takes linear memory anyway. Third, most Chess engines (even the ones written in C) have lazy move generators (i.e. they only produce one move at a time instead of all at once). Fourth, detecting and eliminating duplicates requires an exponential amount of memory whether you write your program in C or in Haskell. Chess is an EXPSPACE problem and there's nothing that you can do to improve that. — Aadit M Shah, Oct 15 '17 at 17:30
Fifth, of course you can have both laziness and transposition tables. Most chess engines do. However, the transposition tables can't store every single transposition and hence you can't eliminate every duplicate node. Laziness doesn't help reduce this. That's not how laziness works. — Aadit M Shah, Oct 15 '17 at 17:32
Even so, transposition tables are used by chess engines, as you yourself note... My question is firstly a Haskell question and only secondly a chess engine question; if your answer does not help me understand why my program is not lazy, then it does not answer my question. — hkBst, Oct 19 '17 at 10:38

How to build an infinite tree with duplicate elimination via cache of weak pointers in Haskell

1 Answers1