How to roll a fast BVH representation in Haskell

Question

I'm playing with a Haskell Raytracer and currently use a BVH implementation which stresses a naive binary tree to store the hierarchy,

data TreeBvh
   = Node Dimension TreeBvh TreeBvh AABB
   | Leaf AnyPrim AABB

where Dimension is either X, Y or Z (used for faster traversal) and AABB is my type for an axis-aligned bounding box. This is working reasonably well, but I'd really like to get this as fast as I possibly can. So my next step (when using C/C++) would be to use this tree to construct a flattened representation where the nodes are stored in an array, the "left" child immediately follows it's parent node and the index of right child of the parent is stored with the parent, so I have something like this:

data LinearNode
   = LinearNode Dimension Int AABB
   | LinearLeaf AnyPrim AABB

data LinearBvh
   = MkLinearBvh (Array Int LinearNode)

I didn't really try out this one yet, but I fear the performance would still be sub-par because I can't store LinearNode instances in an UArray, neither could I store the Int indexing the right child together with the Float values which make up the AABB in a single UArray (correct me if I got this wrong). And using two Arrays would mean bad cache coherency. So I'm basically looking for a way to efficiently store my tree so I can expect good performance for traversal. It sould be

compact
have good locality properties
work with recent GHC compilers
should go through as little indirections as possible (going though a "thunk" can't help performance, so "unboxed" types would help I think)

Flagging this, I want to see what people that know more than I say. Good question though so plus 1. — Robert Massaioli, Feb 22 '11 at 11:09
I think Robert meant 'favoriting', cause that's what he did. — antonakos, Feb 22 '11 at 11:42
Have you collected profiling data? You may find that performance bottlenecks in Haskell are significantly different than what you're used to from C. I would do that before flattening the tree or trying to unbox values. — John L, Feb 22 '11 at 17:20
@John: I did, I spend ~50% cycles traversing the tree, which is a bit too much for my simple scene. — Waldheinz, Feb 22 '11 at 17:38
@Waldheinz ~50% of the time even after unboxing and adding strictness annotations or was that with the tree shown above? — Thomas M. DuBuisson, Feb 22 '11 at 18:58
@waldheinz Not having a running copy I can't test, but would bet adding strictness and `{-# UNPACK #-}` pragma would help a lot. — Thomas M. DuBuisson, Feb 22 '11 at 20:34

snk_kid · Answer 1 · 2011-02-22T13:09:59.673

4

If I understood you correctly you want unboxed arrays of user-defined types? if so check-out the vector package which also supports loop fusion. It's worth checking out slides for High-Performance Haskell

edited Feb 22 '11 at 13:09

answered Feb 22 '11 at 13:03

snk_kid

3,457
3
23
18

The vector package seems appealing, could you please give a little code sample outlining how the `LinearNode` and `LinearBvh` types would look like? I would have to implement `Storable` for the `LinearNode`, right? – Waldheinz Feb 22 '11 at 14:36
@Waldheinz You don't need to use storable vectors, those are meant for FFI C interop (but you can use them still). You need to implement an instance of Unbox typeclass for Data.Vector.Unboxed and Storable for the Storable versions. There is an example in the docs here: http://hackage.haskell.org/packages/archive/vector/0.7.0.1/doc/html/Data-Vector-Unboxed.html and there is a tutorial for the package here: http://haskell.org/haskellwiki/Numeric_Haskell:_A_Vector_Tutorial – snk_kid Feb 22 '11 at 14:49
Note that this style turns unboxed vectors of pairs into pairs of unboxed vectors. I tend to think this won't hurt cache, but the OP raised a question about this. Another approach would be to implement Unboxed instances for *adaptive* values: http://hackage.haskell.org/package/adaptive-containers – sclv Feb 22 '11 at 15:00

score 2 · Answer 2 · answered Feb 22 '11 at 14:12

I should really point out that Haskell is not very good at giving the programmer a means of choosing data layout in memory.

You might be interested in storing the tree in a flat array in cache-oblivious way ("Van Emde Boas tree"). It should work, but who knows. :)

(shameless plug: I've made a similar effort some time ago; I've used some advanced type system features of the ATS programming language to make the raytracer both safer and faster; see the code here: http://code.google.com/p/ats-miscellanea/ -- I didn't go very far yet, unfortunately)

score 0 · Answer 3 · answered Nov 09 '11 at 14:02

0

What you're proposing was discovered years ago, it's called a bounding interval hierarchy (BIH).

answered Nov 09 '11 at 14:02

Engineer

8,529
7
65
105

1

I'm aware of the BIH, but I don't really see how this answer fits my question. I'm asking for an efficient in-memory representation of a BVH. If I'd have an BIH I'd probably ask for a efficient representation of that, though. – Waldheinz Nov 10 '11 at 07:29
Apologies; I didn't read your question carefully, it was late. I just saw, "the "left" child immediately follows it's parent" and assumed you were talking about the sort of implicit by-axis ordering which BIHs use to greatly enhance performance. Is there any reason you'd rather not use BIH over BVH? They're really just a qualitative improvement of BVHs, for the most part, although obviously your requirements must be taken into account. – Engineer Nov 10 '11 at 13:52
Actually, I already moved on to a kd-tree already, as in my experience this performs best for tracing tons of rays through static scenes. Still looking for a good way to represent it in memory (the usual 8 bytes per node or something like that), though. – Waldheinz Nov 10 '11 at 14:28
"Static scenes" is the key phrase there, I agree then with your choice of KD-trees (presume you're using SAH?). That's one benefit RLE-type approaches have over others... the memory consumption is absolutely minimal (depends on scene fragmentation though). Of course, it's hard to apply this outside of a terrain-heightmap vscan raycaster. Good luck. – Engineer Nov 10 '11 at 15:06

How to roll a fast BVH representation in Haskell

3 Answers3