1

I'm currently trying to implement a path tracer inside a fragment shader which leverages a very simple BVH

The code for the BVH intersection is based on the following idea:

bool BVHintersects( Ray ray ) {

    Object closestObject;

    vec2 toVisit[100]; // using a stack to keep track which node should be tested against the current ray
    int stackPointer = 1;
    toVisit[0] = vec2(0.0, 0.0);  // coordinates of the root node in the BVH hierarcy    



    while(stackPointer > 0) {
        stackPointer--;  // pop the BVH node to examine

        if(!leaf) {
            // examine the BVH node and eventually update the stackPointer and toVisit
        }

        if(leaf) {
            // examine the leaf and eventually update the closestObject entry
        }
    }
}

The problem with the above code, is that on the second light bounce something very strange starts to happen, assuming I'm calculating light bounces this way:

vec3 color  = vec3(0.0);
vec3 normal = vec3(0.0);
// first light bounce
bool intersects = BVHintersect(ro, rd, color, normal);

vec3 lightPos = vec3(5, 15, 0);

// updating ray origin & direction
ro = ro + rd * (t - 0.01);
rd = normalize(lightPos - ro);

// second light bounce used only to calculate shadows
bool shadowIntersects = BVHintersect(ro, rd, color, normal);

The second call to BVHintersect would run indefinitely, because the while loop never exits, but from many tests I've done on that second call I'm sure that the stackPointer eventually goes back to 0 successfully, in fact if I place the following code just under the while loop:

int iterationsMade = 0;
while(stackPointer > 0) {
    iterationsMade++;
    if(iterationsMade > 100) {
        break;
    }
    // the rest of the loop
    // after the functions ends it also returns "iterationsMade"

the variable "iterationsMade" is always under 100, the while loop doesn't run infinitely, but performance wise it's as if I did "100" iterations, even if "iterationsMade" is never bigger than say 10 or 20. Increasing the hardcoded "100" to a bigger value would linearly degrade performance

What could be the possible causes for this behaviour? What's a possible reason for that second call to BVHIntersect to get stuck inside that while loop if it never does more than 10-20 iterations?

Source for the the BVHintersect function: https://pastebin.com/60SYRQAZ

Domenico
  • 168
  • 1
  • 8
  • A stack size of 100 is quite large, no good BVH would need one that big really except in some absolute worst case, so you should probably reduce that to something more reasonable like 16 for performance. You likely just have a bug somewhere that is causing it to improperly traverse the tree if it never stops with that original code. – Lemon Drop Jul 01 '19 at 18:49
  • @LemonDrop the while loop never does more than 10-20 iterations, "iterationsMade" is never bigger than that in any pixel computed, thank you for your suggestion about the stack size, I've been using some arbitrarily high numbers just for testing but I'll make sure to shrink that one down – Domenico Jul 01 '19 at 18:53
  • Your description makes no sense then, if the while loop was exiting after 10-20 iterations it would not continue forever. To me it sounds like it's only exiting the loop after 100 iterations and that is why increasing that value will change the performance. Are you sure you're reading the iterationsMade value correctly? – Lemon Drop Jul 01 '19 at 18:55
  • I'm using a inout variable in the function declaration, passed back to the main thread in the fragment shader where I simply output a red color for that pixel if iterationsMade is bigger than a set constant. No red pixels are computed after iterationsMade gets bigger than 15-20 Also as specified in another comment the hardcoded value "100" doesn't affect at all the first call of BVHintersects, in the first bounce it could be as high as "10000" without degrading performance at all (and still computing the right result) – Domenico Jul 01 '19 at 18:59
  • 1
    Well I don't know, it could be many things and to me it just sounds like there's a bug somewhere since the results don't work. It could be something like the primary rays are facing a fine direction too but secondary rays when they bounce escape the BVH into the sky or something which causes the loop to never end, there's just not enough info really to tell. – Lemon Drop Jul 01 '19 at 19:04
  • @LemonDrop thank you for your help I'll make sure to also check for that case as well – Domenico Jul 01 '19 at 19:08

1 Answers1

2

So, there's a funny thing about loops in shaders (or most SIMD circumstances):

The entire wave will take at least as long to execute as the slowest thread. So, if one thread needs to take ~100 iterations, then they ALL take 100 iterations. Depending on your platform and compiler, the loop may be unrolled to 100 iterations (or whatever upper bound you choose). Anything after the break won't affect the final output, but the rest of the unrolled loop will still have to be processed. Early-out isn't always possible.

There are a number of ways around this, but perhaps the most straightforward is to do this in multiple passes with a lower max iterations value.

I would also run your shader through a compiler and look at the generated code. Compare different versions with different max iterations and look at things like the length of compiled shader.

See this answer for a little more information.

3Dave
  • 28,657
  • 18
  • 88
  • 151
  • Thank you for your answer 3Dave, I immediately thought about loop unrolling but the interesting part is that changing "100" with "10000" doesn't affect at all the first bounce computation, only the second one. Do you think this could rule out loop unrolling completely as a possible cause of this problem? – Domenico Jul 01 '19 at 18:45
  • @Domenico That's an interesting point. I long ago stopped trying to guess what the shader compiler is doing. When you say "second bounce," do you mean the `intersectsChild2` path? – 3Dave Jul 01 '19 at 18:53
  • Sorry if I wasn't clear about the meaning of "second bounce". I was referring to the second call of BVHintersect which is used to simply calculate shadow rays – Domenico Jul 01 '19 at 19:00
  • To be even more specific, if I comment out `bool shadowIntersects = BVHintersect(ro, rd, color, normal);` I could place whatever hardcoded value in place of "100" without seeing any degradation in performance – Domenico Jul 01 '19 at 19:02
  • @Domenico Could you update your post with the real code that calls BVHIntersect? The one you posted only includes 4 parameters. – 3Dave Jul 01 '19 at 19:12
  • I've made various versions to test different things, would you like to see how I'm checking if iterationsMade is smaller than the hardcoded value? – Domenico Jul 01 '19 at 19:17
  • @Domenico maybe. Another thing (probably unrelated): The ` if (child1T < child2T) {` inside of your ` if(intersectsChild1 && intersectsChild2) {` test. Both branches appear to do the same thing. – 3Dave Jul 01 '19 at 19:20
  • The order in which child nodes are pushed into the stack is important, those branches actually do something different (zw vs. xy!) Main: https://pastebin.com/b6jbGdMe BVHIntersect: https://pastebin.com/UEhYWy2J I'm also not opposed to share a live page so you can see it in your browser if you're curious – Domenico Jul 01 '19 at 23:49
  • EDIT: you guys were right, the check for "iterationsMade" was wrong, as you can see from my source. As soon as I make sure that there's something wrong in the BVH tree traversal I'll mark this answer as correct – Domenico Jul 01 '19 at 23:56
  • 1
    It was a combination of branched execution with unexpected parameters (since branches are always executed if another thread in the wave passes the test) and an error in detecting how many iterations were made inside the BVH traversal, thank you all for your help – Domenico Jul 02 '19 at 13:55