Amortization of functional array-doubling stack

Question

I'm playing around with the idea of a compact stack—one whose space requirements approach that of an array as its size increases. A candidate structure:

data Stack a
  = Empty
  | Zero (Stack a)
  | One !(SmallArray a) (Stack a)
  | Two !(SmallArray a) !(SmallArray a) (Stack a)
-- Invariant: the array size at depth `n` is `2^n`.

push :: a -> Stack a -> Stack a
push = pushA . pure

pushA :: SmallArray a -> Stack a -> Stack a
pushA sa Empty = One sa Empty
pushA sa (Zero more) = One sa more
pushA sa1 (One sa2 more) = Two sa1 sa2 more
pushA sa1 (Two sa2 sa3 more) = One sa1 (pushA (sa2 <> sa3) more)

pop :: Stack a -> Maybe (a, Stack a)
pop stk = do
  (sa, stk') <- popA stk
  hd <- indexSmallArrayM sa 0
  Just (hd, stk')

popA :: Stack a -> Maybe (SmallArray a, Stack a)
popA Empty = Nothing
popA (Zero more) = do
  (sa, more') <- popA more
  let !(sa1, sa2) = -- split sa in two
  Just (sa1, One sa2 more')
popA (One sa more) = Just (sa, Zero more)
popA (Two sa1 sa2 more) = Just (sa1, One sa2 more)

Some numerical experimentation suggests that I can get an O(log n) average cost per operation for a sequence of n pushes. But is it possible to analyze this structure as having O(log n) cost per push or pop? Or if not, can this be done for a similar structure? I haven't been able to find an appropriate debit invariant. The tricky case seems to be a sequence of Two nodes followed by a One node, but I may just be approaching this all wrong.

What do you mean by "*space requirements approach that of an array*"? — Bergi, Jun 25 '20 at 19:46
I guess you'll have a larger chance to get an answer at [CS.SE]. — Bergi, Jun 25 '20 at 19:54
@Bergi, I mean the limit as `n -> infty` of the size of the data structure holding `n` elements, divided by `n`, is 1. I've sketched the implementations. — dfeuer, Jun 25 '20 at 20:35
Ok, I guess I understood it right then, but wouldn't a simple list (`[a]`) achieve that (and `O(1)` push/pop) as well? I think I'm failing to see the advantage of "compacting" the list items into arrays. — Bergi, Jun 25 '20 at 20:41
Thanks for the edit, the approach is much clearer now! It does remind me a bit of a one-sided [finger tree](https://en.wikipedia.org/wiki/Finger_tree). Also, is it right to assume that `<>` and `split sa in two` are linear-time operations? The [`SmallArray` docs](https://hackage.haskell.org/package/primitive-0.7.1.0/docs/Data-Primitive-SmallArray.html) don't state this. — Bergi, Jun 25 '20 at 21:02
@Bergi The constant factors matter here. Lists cost one extra `Cons` node per element (so the limit described by dfeuer would not be 1). With vectors you only pay the cost inherent to the (boxed) element: pointer + contents. — Li-yao Xia, Jun 25 '20 at 21:28
@chepner Yes, by the invariant the top-level smallarray has only a single element. Quite genius, but might warrant an extra comment. — Bergi, Jun 25 '20 at 21:31
@Li-yaoXia Oh, I misread the `1` to mean "constant", not "exactly 1". I would've assumed that for a vector, you'd have to count the size of the pointers as well - a constant overhead per element, much smaller than for a `Cons` node but still constant. — Bergi, Jun 25 '20 at 21:35
@dfeuer Isn't there a pathological situation with an alternation of push and pop where each operation costs O(n) time because big arrays keep getting concatenated/split? — Li-yao Xia, Jun 25 '20 at 21:52
@Li-yaoXia, I *hope* not. The idea is that each operation on an "unsafe" digit (0 or 2) should change it to a "safe" digit (1). But I don't know if the math quite works out for this precise structure or not. — dfeuer, Jun 25 '20 at 22:03
@Bergi, yes, I'm only counting the cost of the pointer for a polymorphic stack. The same idea, if it works, would apply to something unboxed as well. — dfeuer, Jun 25 '20 at 22:22
The worst case I can see is `push … (Two … … (Two … … (…(One … r))))` becoming `One … (One … (One … (…(One … r))))`, being followed by a series of `pop`s until the other pathological case `pop (Zero (Zero (Zero (…(One … r)))))` becoming `One … (One … (…(One … r)))`, being again followed by a series of `push`es and so on. If we assume the unchanged `One … r` to be at depth `d`, it should be possible to calculate the cost of the entire back-and-forth. — Bergi, Jun 25 '20 at 23:11

score 3 · Answer 1 · edited Jul 14 '20 at 01:04

I believe I've figured out a way. The number system I suggested in the question turns out not to be the right one; it doesn't support O(log n) pop (or at least does not do so simply). We can patch this up by switching from 0/1/2 redundant binary to 1/2/3 redundant binary.

-- Note the lazy field in the Two constructor.
data Stack a
  = Empty
  | One !(SmallArray a) !(Stack a)
  | Two !(SmallArray a) !(SmallArray a) (Stack a)
  | Three !(SmallArray a) !(SmallArray a) !(SmallArray a) !(Stack a)

push :: a -> Stack a -> Stack a
push = pushA . pure

pushA :: SmallArray a -> Stack a -> Stack a
pushA sa Empty = One sa Empty
pushA sa1 (One sa2 more) = Two sa1 sa2 more
pushA sa1 (Two sa2 sa3 more) = Three sa1 sa2 sa3 more
pushA sa1 (Three sa2 sa3 sa4 more) = Two sa1 sa2 (pushA (sa3 <> sa4) more)

pop :: Stack a -> Maybe (a, Stack a)
pop stk = do
  ConsA sa stk' <- pure $ popA stk
  hd <- indexSmallArrayM sa 0
  Just (hd, stk')

data ViewA a = EmptyA | ConsA !(SmallArray a) (Stack a)

popA :: Stack a -> ViewA a
popA Empty = EmptyA
popA (Three sa1 sa2 sa3 more) = ConsA sa1 (Two sa2 sa3 more)
popA (Two sa1 sa2 more) = ConsA sa1 (One sa2 more)
popA (One sa more) = ConsA sa $
  case popA more of
    EmptyA -> Empty
    ConsA sa1 more' -> Two sa2 sa3 more'
      where
        len' = sizeofSmallArray sa1 `quot` 2
        sa2 = cloneSmallArray sa1 0 len'
        sa3 = cloneSmallArray sa1 len' len'

The first important step in proving this has the desired amortized bounds is to choose a debit invariant[*]. This had me stuck for quite a while, but I think I've got it.

Debit invariant: We allow the lazy Stack in a Two node as many debits as there are elements stored in that and all earlier Two nodes.

Theorem

push and pop run in O(log n) amortized time.

Proof sketch

Push

We consider each of the cases in turn.

Empty is always trivial.
One: We increase the debit allowance below.
Two: We reduce the debit allowance of nodes below by 1 unit. We pay O(log n) to discharge the excess debits.
Three: This is the tricky case for push. We have some number of Three nodes followed by something else. For each Three node, we suspend s array-doubling work. We pay for that using the additional debit allowance we gain from the elements in the new Two node. When we reach the end of the Three chain, we need to do something a bit funny. We may need the full debit allowance below, so we use debit passing to spread the debits for the final array append across all the earlier nodes.

At the end, we have either Empty, One, or Two. If we have Empty or One, we're done. If we have Two, then changing that to Three reduces the debit allowance below. But we also gain debit allowance below, from all the Threes that have changed to Twos! Our net loss debit allowance is just 1, so we're golden.

Pop

We again proceed by cases.

Empty is trivial.
Three: we increase the debit allowance below.
Two: We reduce the debit allowance on certain nodes by 1 unit; pay O(log n) to discharge the excess debits.
One: This is the hard case. We have some number of One nodes followed by something else. For each One, we perform a split. We place debits to pay for those, discharging the ones at the root. At the end, we have a situation similar to that for push: the tricky case is ending in Two, where we use the fact that all the new Twos pay for the loss of the final Two.

Compactness

One might worry that enough thunks could accumulate in the structure to negate the compactness of the array-based representation. Fortunately, this is not the case. A thunk can appear only on the Stack in a Two node. But any operation on that node will turn it into a One or a Three, forcing the Stack. So thunks can never accumulate in chains, and we never have more than one thunk per node.

[*] Okasaki, C. (1998). Purely Functional Data Structures. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511530104, or read the relevant parts of his thesis online.

My gut feeling is that these `push` and `pop` methods do *not* achieve `O(log n)` complexity. Admittedly I haven't done the math yet. — Bergi, Jul 01 '20 at 18:47
@Bergi, you're right; `Zero` is a bigger problem for `pop` than `Two` is for `push`. I'm going to try switching to a 1/2/3 binary system and see if that can patch it up. — dfeuer, Jul 01 '20 at 19:03
@Bergi, I'm pretty sure I fixed it now. The proof remains a bit incomplete. — dfeuer, Jul 01 '20 at 23:02
@Li-yaoXia, perhaps unwisely, I stated the debit Invariant *before* the **Theorem** heading. — dfeuer, Jul 04 '20 at 03:14
nah, that's totally fine. I missed it because I was looking for some mathy notation. — Li-yao Xia, Jul 04 '20 at 19:22
Here's a proof in Coq https://gist.github.com/Lysxia/3116fc0e0b118361e3746dd03dc10782 — Li-yao Xia, Jul 14 '20 at 00:54
@Li-yaoXia, you sure went all the way! I have queue, deque, and input/output-restricted deque versions now. Hope to give a short talk at the Haskell Love conference. — dfeuer, Jul 14 '20 at 01:00

Amortization of functional array-doubling stack

1 Answers1

Theorem

Proof sketch

Push

Pop

Compactness