In my CPU ray tracer (well, path tracer), the majority of CPU time spent is in the BVH traversal function. According to my profiler, 75% of time spent raytracing is spent in this function and functions that it calls, while 35% of time is spent in the function itself. The other 40% is in the different intersection tests it calls.
Basically, the code does a DFS traversal through all the bounding boxes and triangles it intersects with. It uses a statically allocated array on the stack to hold the nodes to be explored (BVHSTACKSIZE is set to 32, it's never needed most of the space) so that no memory is dynamically allocated. However, it seems crazy to me that 35% of time is spent here. I've spent a while optimizing the code and it's currently at the fastest I've been able to make it, but it's still the largest single bottleneck in my program.
Does anyone have tips for optimizing this even more? I already have a decent BVH construction algorithm so I don't think I'd get any speedup by using a different BVH. Does anyone have tips on how to best do line-by-line profiling on a Mac?
For reference, this code on an example scene takes anywhere from <1 microsecond to 40 microseconds depending on the number of intersections, and the while loop is run for 1 to ~400 iterations (also depending on the number of intersections).
Thanks!
bool BVHAccel::intersect(Ray& ray) const {
bool hit = false;
BVHNode* to_intersect[BVHSTACKSIZE];
int head = 0;
to_intersect[head++] = root;
while (head != 0) {
assert(head < BVHSTACKSIZE);
BVHNode* cur = to_intersect[--head];
if (cur->bb.intersect(ray)) { // Does not modify the ray
if (cur->isLeaf()) {
for (const auto& primitive : cur->primitives) {
hit |= primitive->intersect(ray); // Modifies the ray!
}
} else {
to_intersect[head++] = cur->r;
to_intersect[head++] = cur->l;
}
}
}
return hit;
}
bool BBox::intersect(const Ray& r) const {
double txmin = (min.x - r.o.x) * r.inv_d.x;
double txmax = (max.x - r.o.x) * r.inv_d.x;
double tymin = (min.y - r.o.y) * r.inv_d.y;
double tymax = (max.y - r.o.y) * r.inv_d.y;
double tzmin = (min.z - r.o.z) * r.inv_d.z;
double tzmax = (max.z - r.o.z) * r.inv_d.z;
ascending(txmin, txmax);
ascending(tymin, tymax);
ascending(tzmin, tzmax);
double t0 = std::max(txmin, std::max(tymin, tzmin));
double t1 = std::min(txmax, std::min(tymax, tzmax));
if (t1 < t0 || t0 > r.max_t || t1 < r.min_t) {
return false;
}
return true;
}
void ascending(double& a, double& b) {
if (a > b) {
std::swap(a, b);
}
}