Data structure / approach for efficient raytracing

Question

I'm writing a 3D raytracer as a personal learning project (Enlight) and have run into an interesting problem related to doing intersection tests between a ray and a scene of objects.

The situation is:

I have a number of primitives that rays can intersect with (spheres, boxes, planes, etc.) and groups thereof. Collectively I'm calling these scene objects.
I want to be able to scene objects primitives with arbitrary affine transformations by wrapping them in a Transform object (importantly, this will enable multiple instances of the same primitive(s) to be used in different positions in the scene since primitives are immutable)
Scene objects may be stored in a bounding volume hierarchy (i.e. I'm doing spatial partitioning)
My intersection tests work with Ray objects that represent a partial ray segment (start vector, normalised direction vector, start distance, end distance)

The problem is that when a ray hits the bounding box of a Transform object, it looks like the only way to do an intersection test with the transformed primitives contained within is to transform the Ray into the transformed co-ordinate space. This is easy enough, but then if the ray doesn't hit any transformed objects I need to fall back to the original Ray to continue the trace. Since Transforms may be nested, this means I have to maintain a whole stack of Rays for each intersection trace that is done.

This is of course within the inner loop of the whole application and the primary performance bottleneck. It will be called millions of times a second so I'm keen to minimise complexity / avoid unnecessary memory allocation.

Is there a clever way to avoid having to allocate new Rays / keep a Ray stack?

Or is there a cleverer way of doing this altogether?

I'm not sure this would be faster than the memory allocations, but you could try to come up with an efficient transform inversion algorithm and then just multiply the current ray with the inverse transform when backing off from the current object. — Ivan Vergiliev, Dec 18 '12 at 00:06
@Ivan - interesting idea. I guess it might be marginally faster, though I'd be worried then about compounding numerical precision problems..... — mikera, Dec 18 '12 at 00:29
You could pre compute and cache a transform and inverse transform (I.e. a matrix object) for each object (as well as objects in the group) that will convert to and from the global frame. This way you don't need a nested hierarchy as you can do a hit test on each object directly. I.e. transform the ray to the frame of the object, then transform back to get the hit point in global frame. I do this in my tracer: http://github.com/danieljfarrell/pvtrace — Daniel Farrell, Dec 18 '12 at 05:20

score 2 · Answer 1 · answered Dec 29 '12 at 23:04

Most of the time in ray-tracing you have a few (hundred, thousand) objects and quite a few more rays. Probably millions of rays. That being the case, it makes sense to see what kind of computation you can spend on the objects in order to make it faster/easier to have the rays interact with them.

A cache will be very helpful as boyfarrell suggested. It might make sense to not only create the forward and reverse transforms on the objects that would move them to or from the global frame, but also to keep a copy of the object in the global frame. It makes it more expensive to create objects or move them (because the transform changes and so do the cached global frame copies) but that's probably okay.

If you cast N rays and have M objects and N >> M then it stands to reason that every object will have multiple rays hit it. If we assume every ray hits an object then every object has N/M rays that strike it. That means transforming N/M rays to each object, hit testing, and possibly reversing it back out. Or N/M transforms per object at minimum. But if we cache the transformed object we can perform a single transform per object to get to the global frame and then not need any additional. At least for the hit-testing.

score 1 · Answer 2 · answered Jan 21 '13 at 09:53

Define your primitives in their base form (unity scale, centered on 0,0,0, not rotated) and then move them in the scene using transformations only. Cache the result of the complete forward and reverse transformations in each object. (Do not forget the normal vectors, you will need them for reflections)

This will give you the ability to test the hit using simplified math (you reverse transform the ray to the object space and compute hit with base form object) and then transform the hit point and possible reflection vector back to the real world space using the other transform.

You will need to compute intersections with all objects in scene and select the hit which is closest to the ray origin (but not in negative distance). To speed this even more, enclose multiple objects to "bounding boxes" that will be very simple to compute hit on and will pass the real world ray to the enclosed objects if hit (but all objects will still use their precomputed matrices).

is this the approach used for video? In the case that the scene will be rendered more than once, isn't computing the transform of every ray per object not scalable to the number of objects in world space? My naive assumption is that it's more efficient to handle transformation parameters at the intersection checks. — tay10r, Nov 23 '19 at 23:03
There are much more intersection checks than objects in the scene. So yes, all the caches consume memory to save cpu time. If you are willing to wait longer, you can decide to avoid caching. There are more optimization options though. You can split your scene into (tree of) groups and use simple to check bounding boxes. That saves quite a lot of cpu as ideally only log n objects need to be tested for hits. — MarSik, Nov 25 '19 at 08:34

Data structure / approach for efficient raytracing

2 Answers2