Where is my kd tree traversal code wrong?

Question

I was optimizing my c++ raytracer. I'm tracing single rays through kdtrees. So far I was using Havran's recursive algorithm 'B', which seems antique and overblown for OOP. My new code is as short as possible (and hopefully more easily optimized by the compiler AND more easily maintained):

    struct StackElement{
        KDTreeNode<PT>* node;
        float tmax;
        array<float, 3> origin;
    };

    //initializing explicit stack
    stack<StackElement> mystack;

    //initialize local variables
    KDTreeNode<PT>* node = tree.root;
    array<float, 3> origin {ray.origin[0], ray.origin[1], ray.origin[2]};
    const array<float, 3> direction {ray.direction[0], ray.direction[1], ray.direction[2]};
    const array<float, 3> invDirection {1.0f / ray.direction[0], 1.0f / ray.direction[1], 1.0f / ray.direction[2]};
    float tmax = numeric_limits<float>::max();
    float tClosestIntersection = numeric_limits<float>::max();
    bool notFullyTraversed = true;

    while(notFullyTraversed) {
        if (node->isLeaf()) {

            //test all primitives inside the leaf
            for (auto p : node->primitives()) {
                p->intersect(ray, tClosestIntersection, intersection, tmax);
            }

            //test if leaf + empty stack => return
            if (nodeStack.empty()) {
                notFullyTraversed = false;
            } else {
                //pop all stack
                origin = mystack.top().origin;
                tmax = mystack.top().tmax;
                node = mystack.top().node;
                mystack.pop();
            }
        } else {
            //get axis of node and its split plane
            const int axis = node->axis();
            const float plane = node->splitposition();

            //test if ray is not parallel to plane
            if ((fabs(direction[axis]) > EPSILON)) {
                const float t = (plane - origin[axis]) * invDirection[axis];

                //case of the ray intersecting the plane, then test both childs
                if (0.0f < t && t < tmax) {
                    //traverse near first, then far. Set tmax = t for near
                    tmax = t;
                    //push only far child onto stack
                    mystack.push({
                                 (origin[axis] > plane )  ? node->leftChild() : node->rightChild() ,
                                 tmax - t,
                                 {origin[0] + direction[0] * t, origin[1] + direction[1] * t, origin[2] + direction[2] * t}
                                 });
                }
            }
            //in every case: traverse near child first
            node = (origin[axis] > plane) ? node->rightChild() : node->leftChild();

        }
    }

    return intersection.found;

It's not traversing the far child often enough. Where do I miss a related case?

First, don't make too much effort in trying to "beat the compiler" for speed by writing code you may think is "speedier". You may wind up having the compiler's optimizer not touch the code due to aliasing issues. Second, have you used a debugger to see what the issue is and where your code takes the wrong path? Last, `So far I was using Havran's recursive algorithm 'B', which seems antique and overblown for OOP` So do you have a program using this recursive algorithm, and if so, does it work? If it does, run it side-by-side against your new program to see where your new program goes awry. — PaulMcKenzie, Aug 18 '14 at 16:04
When you used the debugger, what part of your code was causing the issue? You did use a debugger before posting? Have you tried changing the number of rasters processed to help find where it went awry? — Thomas Matthews, Aug 18 '14 at 16:30
Also: Use a single stack and push a structure, instead of trying to keep three stacks in sync. Will give you better locality. — Anteru, Aug 18 '14 at 19:20
@Anteru: great tip. For the debugging: I didn't try that before posting because it seemed like a needle in a haystack. There are no problems with simpler models, and it needs more than a small amount of rays. My debugging skills are not that great, but I tried and spend the last hour or so following it. The problem is: I know there are differences. I'm not pushing the far node often enough, so I'm definitely missing that case too often. btw: the pictures are not correct. I found a bug in my kd tree construction so it only split on one axis. That's why it looks like a slice of the teapot. — 5-to-9, Aug 18 '14 at 19:39
It shouldn't be a needle in a haystack. First task is to find the *simplest, smallest possible* input case it fails on. Then, add copious debugging output (or step through in the debugger), to allow you to compare progress against a hand calculation. This should very quickly allow you to identify the first point at which behaviour diverges from expectations, and then you can work backward from there. — Oliver Charlesworth, Aug 18 '14 at 19:51

score 0 · Answer 1 · answered Aug 19 '14 at 09:45

One problem was small (original, wrong code):

      //traverse near first, then far. Set tmax = t for near
      tmax = t;
      //push only far child onto stack
      mystack.push({ ... , tmax - t,  ...    });

it always pushes 0.0f onto the stack as the exit distance for the far node, meaning no positive t is accepted for intersections.

Swapping both lines fixes that problem.

My resurcive stack trace / decisions are still different (Havran takes about ~25% more iterations) with an output picture having 99,5% of the same pixels. That's inside floating point rounding issues, but it still does not answer the question: What case is not recognized by that simplified implementation? Or what operation is not (numerically) stable enough in this version?

Where is my kd tree traversal code wrong?

1 Answers1