8

I am looking for a data structure in Haskell that supports both fast indexing and fast append. This is for a memoization problem which arises from recursion.

From the way vectors work in c++ (which are mutable, but that shouldn't matter in this case) it seems immutable vectors with both (amortized) O(1) append and O(1) indexing should be possible (ok, it's not, see comments to this question). Is this poossible in Haskell or should I go with Data.Sequence which has (AFAICT anyway) O(1) append and O(log(min(i,n-i))) indexing?

On a related note, as a Haskell newbie I find myself longing for a practical, concise guide to Haskell data structures. Ideally this would give a fairly comprehensive overview over the most practical data structures along with performance characteristics and pointers to Haskell libraries where they are implemented. It seems that there is a lot of information out there, but I have found it to be a little scattered. Am I asking too much?

Shoe
  • 74,840
  • 36
  • 166
  • 272
Paul
  • 7,836
  • 2
  • 41
  • 48
  • I'm pretty sure no such data structure exists. If all your appends happen linearly, use `Data.Vector` since fusion will give you the performance you want, otherwise use `Data.Sequence` – Philip JF May 03 '12 at 21:56
  • Thanks. I think it should be possible, but maybe to specialized ... Can you explain what you mean by 'all your appends happend linearly'? Do you mean with linearly increasing indices? It seems with vector each operation would still be O(n) ... – Paul May 03 '12 at 22:01
  • basically, if you can think of the vector as being "threaded" through your code in a way that doesn't save intermediate values, the appends can be "fused" into a single operation. – Philip JF May 03 '12 at 22:03
  • "From the way vectors work in c++ (which are mutable, but that shouldn't matter in this case) it seems immutable vectors with both O(1) append and O(1) indexing should be possible". How do you imagine such a data structure to work? How would it avoid copying the data on insertion and still have `O(1)` complexity for that operation? In short, I think you're wrong here. The mutability of the C++ vector *does* matter here. – Niklas B. May 03 '12 at 22:03
  • @NiklasB. In c++ vectors are essentially a pointer to the beginning of the data + the length of the data. So you could append by appending the new element to the data in memory and creating a new pointer with the incremented size. So it is only amortized O(1), but still. The old pointer and old size would be left unchanged and thus the old vector would appear unchanged. – Paul May 03 '12 at 22:07
  • 2
    @Paul: Well, immutability requires that you could perform the append operation on the same base vector with two different values and it should yield two different vectors as a result. That's just impossible with the scheme you describe (the vector would have to be copied at least once). – Niklas B. May 03 '12 at 22:10
  • @NiklasB. Ok, that's true... shoot – Paul May 03 '12 at 22:11

3 Answers3

10

For simple memoization problems, you typically want to build the table once and then not modify it later. In that case, you can avoid having to worry about appending, by instead thinking of the construction of the memoization table as one operation.

One method is to take advantage of lazy evaluation and refer to the table while we're constructing it.

import Data.Array
fibs = listArray (0, n-1) $ 0 : 1 : [fibs!(i-1) + fibs!(i-2) | i <- [2..n-1]]
  where n = 100

This method is especially useful when the dependencies between the elements of the table makes it difficult to come up with a simple order of evaluating them ahead of time. However, it requires using boxed arrays or vectors, which may make this approach unsuitable for large tables due to the extra overhead.

For unboxed vectors, you have operations like constructN which lets you build a table in a pure way while using mutation underneath to make it efficient. It does this by giving the function you pass an immutable view of the prefix of the vector constructed so far, which you can then use to compute the next element.

import Data.Vector.Unboxed as V
fibs = constructN 100 f
  where f xs | i < 2 = i
             | otherwise = xs!(i-1) + xs!(i-2)
             where i = V.length xs
hammar
  • 138,522
  • 17
  • 304
  • 385
  • Wow, this `constructN` is swank. – applicative May 04 '12 at 00:13
  • 1
    I could be mistaken, but I think that is going to exceed the size of `Int` and there is no `Unbox Integer` instance in Data.Vector.Unboxed. – Doug Moore May 04 '12 at 01:07
  • 1
    @DougMoore: Yes, this will overflow. The point was to illustrate memoization, not to provide a good way of computing Fibonacci numbers. For that, there are much better algorithms which don't require any memoization :) – hammar May 04 '12 at 01:11
  • @hammar I know, it just isn't often that I get to make a comment. :) – Doug Moore May 04 '12 at 01:14
  • 2
    @hammar Thank you for this answer. I am not quite sure if my problem is simple enough to fit this approach (it uses the memoization fixpoint function + a function which gives the terminal state + an update rule which takes a (persumably) memoized version of itself). I'll have to think about it a bit. – Paul May 04 '12 at 07:15
9

If memory serves, C++ vectors are implemented as an array with bounds and size information. When an insertion would increase the bounds beyond the size, the size is doubled. This is amortized O(1) time insertion (not O(1) as you claim), and can be emulated just fine in Haskell using the Array type, perhaps with suitable IO or ST prepended.

Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
  • 2
    Yes, it is amortized O(1), I know, forgot to mention it. I will checkout Array, do you have a good source for learning about it in conjunction with IO or ST? I would prefer the code to be pure if at all possible and have no experience with the ST monad. – Paul May 03 '12 at 22:03
  • 2
    This code isn't going to be pure. That's just not possible. Mutability versus non-mutability _does_ make the difference. – Louis Wasserman May 03 '12 at 22:04
  • @Paul [Lazy Functional State Threads](http://www.cs.fit.edu/~ryan/library/functional_programming/lazy-functional-state-threads.pdf) is the original `ST` paper, I guess. – Daniel Wagner May 03 '12 at 22:07
  • Indeed, I'd be a little surprised if it's possible even in a mutable language to have O(1) insertion and O(1) indexing since either you have a linked-style structure (you lose O(1) indexing) or you have to move your data around when it grows (you lose O(1) insertion). – Venge May 03 '12 at 22:09
  • @Patrick amortized performance is the meaningful metric when dealing with appends, since even allocation doesn't have O(1) worst case performance in real systems. `std::vector` gets amortized O(1) appends, reads, and updates, but is only ephemeral. – Philip JF May 03 '12 at 22:20
  • 2
    @Paul I've done the first 1% of writing a library that does this and [stuck it on hpaste](http://hpaste.org/68041). I waive whatever copyrights I had left after I uploaded it to hpaste. You'll probably want to expand the API, change the names, benchmark, and do all that good stuff, but it should at least give you a flavor of programming in the `ST` monad to get you started with. – Daniel Wagner May 03 '12 at 22:52
  • @PhilipJF Well, sure, obviously in the real world we only care about amortization. I meant it more as a pure CS question. – Venge May 04 '12 at 03:40
  • 1
    In a pure CS universe you can have both: an infinite sized array with a pointer to the last element used--no copying necessary. Okay, perhaps you dont believe in infinite sized, but clearly we can make recopies as infrequent as we desire by just using a larger block of RAM. Exponentially growing dynamic arrays are optimal in the meaningful dimensions. – Philip JF May 04 '12 at 04:41
  • @DanielWagner Thank you so much for the link re ST Monad. The Haskell community continues to amaze me ;) – Paul May 04 '12 at 07:11
  • The big problem with array is that it doesn't have a safe way of indexing into it. `Data.Vector` is probably the right choice if you care about that – JonnyRaa Jul 12 '19 at 11:47
7

Take a look at this to make a more informed choice of what you should use.

But the simple thing is, if you want the equivalent of a C++ vector, use Data.Vector.

Community
  • 1
  • 1
trutheality
  • 23,114
  • 6
  • 54
  • 68