15

So ... I've used unboxed vectors (from the vector package) preferably now without giving it much consideration. vector-th-unbox makes creating instances for them a breeze, so why not.

Now I ran into an instance where it is not possible for me to automatically derive those instances, a data type with phantom type parameters (as in Vector (s :: Nat) a, where s encodes the length).

This made me think about the differences between Storable and Unboxed vectors. Things I figured out on my own:

  • Unboxed will store eg tuples as separate vectors leading to better cache locality, by not wasting bandwidth when only one of those values is needed.
  • Storable will still be compiled to simple (and probably efficient) readArray#s that return unboxed values (as evident by reading core).
  • Storable allows direct pointer access which allows interoperability with foreign code. Unboxed doesn't.
  • [edit] Storable instances are actually easier to write by hand than Unbox (that is Vector and MVector) ones.

That alone doesn't make it evident to me why Unboxed even exists, there seem to be little benefit to it. Probably I am missing something there?

fho
  • 6,787
  • 26
  • 71
  • I swear I ranted about this somewhere, but I cannot find it anymore :( I'm not aware of any performance difference between `Unboxed` and `Storable` vectors. You may find [this answer](http://stackoverflow.com/a/21897900/925978) helpful, as well as my comment on the OP. – crockeea Oct 21 '16 at 13:12
  • Hmm ... "everyone wants unboxed without knowing why" comment seem appropriate. – fho Oct 21 '16 at 13:38
  • Haha, it *was* in there! I suppose I'm interested in an answer to this question, since it was essentially posed in my (linked) answer as well. – crockeea Oct 21 '16 at 14:27
  • `Storable` vectors are just `ForeignPtr`s, so it has all the same properties: running a finalizer, not movable by GC, etc. Furthermore, `Storable` instances should lay out arrays of the type in a way that should be recognized as an array by C (i.e. contiguous, correct alignment) - however, `Unbox` allows you to write custom instances which do whatever the hell you want! e.g. you may read every `k`th element more frequently than the rest, so you may decide to store those elements in a contiguous block, or as you mentioned, store tuples separately for fewer cache misses. – user2407038 Oct 21 '16 at 19:58
  • .. I think the doc states quite clearly why `Unbox` exists: "Data.Vector.Unboxed: Unboxed vectors with an **adaptive representation** based on data type families." – user2407038 Oct 21 '16 at 20:00

2 Answers2

22

Cribbed from https://haskell-lang.org/library/vector

Storable and unboxed vectors both store their data in a byte array, avoiding pointer indirection. This is more memory efficient and allows better usage of caches. The distinction between storable and unboxed vectors is subtle:

  • Storable vectors require data which is an instance of the Storable type class. This data is stored in malloced memory, which is pinned (the garbage collector can't move it around). This can lead to memory fragmentation, but allows the data to be shared over the C FFI.
  • Unboxed vectors require data which is an instance of the Prim type class. This data is stored in GC-managed unpinned memory, which helps avoid memory fragmentation. However, this data cannot be shared over the C FFI.

Both the Storable and Prim typeclasses provide a way to store a value as bytes, and to load bytes into a value. The distinction is what type of bytearray is used.

As usual, the only true measure of performance will be benchmarking. However, as a general guideline:

  • If you don't need to pass values to a C FFI, and you have a Prim instance, use unboxed vectors.
  • If you have a Storable instance, use a storable vector.
  • Otherwise, use a boxed vector.

There are also other issues to consider, such as the fact that boxed vectors are instances of Functor while storable and unboxed vectors are not.

Michael Snoyman
  • 31,100
  • 3
  • 48
  • 77
  • 1
    Hmm... Didn't knew there are tutorials on the haskell-lang page. Maybe a paragraph like that should be added to the vector documentation. – fho Oct 23 '16 at 07:24
  • Maybe send them a PR? Coming from a user who was confused about the library it's more likely to be merged in. – Michael Snoyman Oct 23 '16 at 09:18
0

Another difference is memory overhead:

As per my measurements:

  • Data.Vector.Storable.Vector Int has 64 Bytes overhead
  • Data.Vector.Unboxed.Vector Int has 48 Bytes overhead.

Source:

nh2
  • 24,526
  • 11
  • 79
  • 128