5

If I'm counting the occurences of characters in a string, I could easily implement this using an array in an imperative language, such as the following:

char values[256]; char c;

while (c = readChar()) {
  values[c] += 1;
}

I can see how to do this in Haskell using something like Data.Vector.Mutable, which provides fast implementation of int-indexed mutable arrays.

But how could I easily do this using just Haskell with no additional packages and/or extensions? Of in other words, how can I implement a fast O(1) collection with indexing and mutability?

Jakub Arnold
  • 85,596
  • 89
  • 230
  • 327
  • @Lee not sure if it's just about accessing the index, since the data structure also needs to be mutable (or provide a way to work around mutability otherwise while keeping the O(1)) – Jakub Arnold Nov 27 '14 at 11:37
  • 2
    Why do you want to do it without additional packages? If you want a mutable array that's exactly what `Data.Vector.Mutable` is for! – Tom Ellis Nov 27 '14 at 11:45
  • @TomEllis Just because there's a library to do something, it doesn't mean you should always use the library. I'm trying to understand how this works underneath and how can I implement it in a simple way myself. Re-implementing a library is the best way to understand how it works. – Jakub Arnold Nov 27 '14 at 11:47
  • @TomEllis Also, I've looked at the source code for Vector, but it's rather large codebase ... so I'm basically looking for the basic idea that enables it to be efficient. – Jakub Arnold Nov 27 '14 at 11:47
  • 2
    It probably is implemented through compiler intrinsics. If you wanted to implement something like that yourself, you might need to use FFI - foreign function interface. It's not that hard, but may look weird to a novice. – Sassa NF Nov 27 '14 at 11:53
  • possible duplicate of [Frequency of characters](http://stackoverflow.com/questions/21132026/frequency-of-characters) – josejuan Nov 27 '14 at 11:55
  • @SassaNF Actually if you look at the source of the repo, there is no FFI, it's all pure Haskell https://github.com/haskell/vector – Jakub Arnold Nov 27 '14 at 11:56
  • 1
    @josejuan This is **not a duplicate**, the referenced question uses `vector` for a solution. What I'm asking is how to implement a data structure *that has vector-like properties*. Please read the updated title of the question. – Jakub Arnold Nov 27 '14 at 11:57
  • If you want to implement something like this yourself using *only* "base", and no FFI or compiler intrinsics, you can do it by manually allocating and modifying memory. See [`ForeignPtr`](http://hackage.haskell.org/package/base-4.7.0.1/docs/Foreign-ForeignPtr.html) (or `Ptr` if you want complete control of deallocation). – gspr Nov 27 '14 at 11:57
  • 5
    You could write the same (imperative) algorithm with [STUArray](https://hackage.haskell.org/package/array-0.5.0.0/docs/Data-Array-ST.html#g:2) from the array package, shipped with GHC, I guess. – Alp Mestanogullari Nov 27 '14 at 12:10
  • 3
    I think you will find it ends up using hackage.haskell.org/package/primitive-0.2.1/docs/src/Data-Primitive-Array.html - here you can see things like `primitive_ (writeArray# arr# i# x)`. I don't know how to parse those, but I bet they come from GHC-specific intrinsics. – Sassa NF Nov 27 '14 at 12:20
  • @JakubArnold, your question "How can I implement a collection with O(1) indexing and mutability in Haskell?" is clearly duplicated. *NOW* ("read the updated title") you are asking other thing... (low level implementation or how current packages implement it) :D – josejuan Nov 27 '14 at 13:45

1 Answers1

8

The implementation of vector uses internal GHC functions called primops. You can find them in the package ghc-prim which is hard-wired into GHC. It provides, among others, the following array functions:

newArray# :: Int# -> a -> State# s -> (#State# s, MutableArray# s a#) 
readArray# :: MutableArray# s a -> Int# -> State# s -> (#State# s, a#)
writeArray# :: MutableArray# s a -> Int# -> a -> State# s -> State# s 

These functions are implemented by GHC itself, but they are really lowlevel. The primitive package provides nicer wrappers of these functions. For arrays, these are:

newArray :: PrimMonad m => Int -> a -> m (MutableArray (PrimState m) a) 
readArray :: PrimMonad m => MutableArray (PrimState m) a -> Int -> m a 
writeArray :: PrimMonad m => MutableArray (PrimState m) a -> Int -> a -> m () 

Here is a simple example using these functions directly (IO is a PrimMonad):

import Data.Primitive.Array
import Control.Monad

main :: IO ()
main = do
  arr <- newArray 3 (0 :: Int)
  writeArray arr 0 1
  writeArray arr 1 3
  writeArray arr 2 7
  forM_ [0..2] $ \i -> putStr (show i ++ ":") >> readArray arr i >>= print

Of course, in practice you would just use the vector package, which is much more optimized (stream fusion, ...) and also easier to use.

bennofs
  • 11,873
  • 1
  • 38
  • 62