Can raymarching be accelerated under an SIMD architecture?

Question

The answer would seem to be no, because raymarching is highly conditional i.e. each ray follows a unique execution path, since on each step we check for opacity, termination etc. that will vary based on the direction of the individual ray.

So it would seem that SIMD would largely not be able to accelerate this; rather, MIMD would be required for acceleration.

Does this make sense? Or am I missing something(s)?

score 1 · Accepted Answer · edited May 23 '17 at 11:43

As stated already, you could probably get a speedup from implementing your vector math using SSE instructions (be aware of the effects discussed here - also for the other approach). This approach would allow the code stay concise and maintainable.

I assume, however, your question is about "packet traversal" (or something like it), in other words to process multiple scalar values each of a different ray:

In principle it should be possible deferring the shading to another pass. The SIMD packet could be repopulated with a new ray once the bare marching pass terminates and the temporary result is stored as input for the shading pass. This will allow to parallelize a certain, case-dependent percentage of your code exploting all four SIMD lanes. Tiling the image and indexing the rays within it in Morton-order might be a good idea too in order to avoid cache pressure (unless your geometry is strictly procedural).

You won't know whether it pays off unless you try. My guess is, that if it does, the amount of speedup might not be worth the complication of the code for just four lanes.

Have you considered using an SIMT architecture such as a programmable GPU? A somewhat up-to-date programmable graphics board allows you to perform raymarching at interactive rates (see it happen in your browser here).

Just to [add to your answer](http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html) for others who are uncertain of the differences between SIMT & SIMD. — Engineer, Sep 02 '12 at 11:42

score 1 · Answer 2 · answered Mar 15 '12 at 13:57

The last days I built a software-based raymarcher for a menger sponge. At the moment without using SIMD and I also used no special algorithm. I just trace from -1 to 1 in X and Y, which are U and V for the destination texture. Then I got a camera position and a destination which I use to calculate the increment vector for the raymarch.

After that I use a constant value of iterations to perform, in which only one branch decides if there's an intersection with the fractal volume. So if my camera eye is E and my direction vector is D I have to find the smallest t. If I found that or reached a maximal distance I break the loop. At the end I have t - from that I calculate the fragment color.

In my opinion it should be possible to parallelize these operations by SSE1/2, because one can solve the branch by null'ing the field in the vector (__m64 / __m128), so further SIMD operations won't apply here. It really depends on what you raymarch/-cast but if you just calculate a fragment color from a function (like my fractal curve here is) and don't access memory non-linearly there are some tricks to make it possible.

Sure, this answer contains speculation, but I will keep you informed when I've parallelized this routine.

score 0 · Answer 3 · answered Feb 05 '12 at 13:30

0

Only insofar as SSE, for instance, lets you do operations on vectors in parallel.

answered Feb 05 '12 at 13:30

FeepingCreature

3,648
2
26
25

Can raymarching be accelerated under an SIMD architecture?

3 Answers3