Implementing efficient zipper like data structure in Haskell with O(1) elements access

Question

The question

I want to create a datatype, which will allow for fast access and modification of its elements. Is it possible to create in Haskell a structure and functions, which will perform as fast as simple C++ implementation?

Problem details

I'm writing an compiler in Haskell. I've got AST represented by a datatype, let's consider following one:

import Prelude hiding (id)

-- this is a sample data type, the real one has got a lot of constructors
data AST = A { id :: Int, x :: AST, y :: AST, z :: AST }
         | B { id :: Int }
         | C { id :: Int, x :: AST, y :: AST }
         | D { id :: Int, u :: AST, v :: AST, w :: AST}

Each AST node has got an unique identifier. I would love to implement in Haskell following functionality:

a function getById, which will return an AST node of choosen id in O(1) time complexity.
be able to create "focuses" on the structure and modify focused elements independently from each other. So I would like to be able to remember focuses on some sub-trees and be able to modify each of such focus in O(1) time complexity.

I was thinking about Zippers, but there are 3 problems with them:

They are used (as far as I know) with simple datatypes, like binary trees, where we can say, we choose "left" or "right" branch. Is there any simple way to use them on complex datatypes like the one above?
I think they will not allow me to implement the function getById with O(1)time complexity, am I right?
I think it is impossible to create some independent focuses using Zippers. By independent focuses I mean, focuses, which will allow us to modify different parts of the datatype without need of recomputing other focuses (in O(1)).

The C++ way of thinking

In C++ we would be able to create array of pointers to AST nodes nodePtrs. The function nodeById would perform in O(1), simply by accessing *(nodePtrs[id]). Because the C++ structure is mutable, we would be able to modify its elements without any restrictions in O(1).

Mutation is not bad till it is controlled. Haskell has many packages which provide mutable data structures like `MVector` — Ankur, Nov 25 '13 at 05:09
You should try getting into a Haskell mindset. Mutating your AST is not the Haskell way. — augustss, Nov 25 '13 at 07:10
@Ankur: I would prefer to stay with functional way as augustss suggests. Is the mutable way the only one to get the same speed as in C++? — Wojciech Danilo, Nov 25 '13 at 09:25
@augustss: I feel it too, but how can I gain the same speed with unmutable data? — Wojciech Danilo, Nov 25 '13 at 09:34
@danilo2 Have you done it the Haskell way and found it too slow? — augustss, Nov 25 '13 at 11:46
@augustss: We are not talking, that it is too slow. I know that checking in map would give me O(nlogn) not O(1). Checking in map is not faster, is it? Do I have to benchmark it to prove it? Or there is other "haskellish" way to do it? — Wojciech Danilo, Nov 25 '13 at 12:06
@danilo2 - the "Haskellish" way is to use a set of datatypes to represent each element of the AST rather than a single (mostly untyped) AST node - see this for example - http://hackage.haskell.org/package/language-glsl-0.0.0/docs/Language-GLSL-Syntax.html — stephen tetley, Nov 25 '13 at 12:26
I'm not sure why you need fast access to individual nodes. Usually you just recurse of the the AST and do some transformations. Accessing individual nodes that by an ID is rare (I've written a few compilers). — augustss, Nov 25 '13 at 14:51

score 2 · Answer 1 · answered Nov 25 '13 at 10:28

I think zippers are actually always atainable, have you heard about differentiation?

Well, about getById, I'm not sure it's a good idea. You probably want a Haskell function like getById :: Int -> IO AST that uses lookup in an array or something. But since you later want to be able to modify values (at least conceptually), do you want getById to return the new modified AST-values or the first AST-value that got stored? This all becomes problematic. It might be a good idea to see if you can just remove the IDs for your haskell version.

I think your focuses sounds doable. If we say that ZAST is the zipper datatype for the AST. Then you perhaps could have something like.

makeFocus :: ZAST -> Focus ZAST

type Focus =
  (ZAST -> ZAST) -> -- The modifier of the "below part"
  ZAST ->           -- The new "above part", you have to provide it again as it might have changed
  ZAST              -- The Result

But yea, it's not really as convenient as the C++ way.

In conclusion, I think you should take a step back and see if what actually you're trying to do (AST optimizations, emitting assembly, etc.) can be done efficiently with immutable data structures. Rather than fix your mind on trying to achieve the same specifications of the mutable C++ data structure in Haskell.

Implementing efficient zipper like data structure in Haskell with O(1) elements access

1 Answers1