0

I am wondering if it is possible in Cuda or Optix to accelerate the computation of the minimum and maximum value along a line/ray casted from one point to another in a 3D volume.

If not, is there any special hardware on Nvidia GPU's that can accelerate this function (particularly on Volta GPUs or Tesla K80's)?

MVTC
  • 845
  • 11
  • 28
  • 2
    In CUDA it would only be possible if you write code for it yourself. CUDA is a general purpose compute API and nothing more. There are no raycasting facilities built-in to CUDA and none of the potential hardware accelerators (if such things actually exist) are exposed in the API or the language – talonmies Apr 07 '21 at 01:16
  • @talonmies But I was asking specifically about Optix, which is Nvidia's accelerated ray tracing API that internally uses Cuda and RTX technology if available. On wikipedia, it says that RTX runs on Volta, which is my target architecture, and accelerates ray tracing by means of Tensor cores. But I see that Optix accelerates ray casting, not just by means of special hardware, but also algorithms. Still my question stands, whether I can use Optix for the mentioned task. As I understand it, you can use Optix kernels together in combination with Cuda kernels. – MVTC Apr 07 '21 at 07:08
  • 2
    So when you wrote "if it is possible in Cuda or Optix" you didn't mean CUDA, you meant Optix? – talonmies Apr 07 '21 at 07:16
  • I wasn't sure if there is a technique that can be used in Cuda besides using Optix, so I left that open. There is special hardware on GPUs to accelerate certain things, like trilinear interpolation as an example. – MVTC Apr 07 '21 at 07:29
  • RTX GPUs provide hardware accelerated ray tracing. The principal RTX engine provides hardware acceleration for determining the first thing a ray hits in a BVH. According to what I see as a common definition of "ray-casting", that seems to be the definition. I'm not sure what Min/Max ray casting is, but based on your definition here it would probably involve multiple ray traversals in the RTX engine. None of this is exposed in CUDA (there is no way in CUDA to access the RTX engine) however it is exposed in Optix. – Robert Crovella Apr 22 '21 at 01:26
  • I'd personally be surprised if anyone at NVIDIA described a volta processor as providing "hardware accelerated ray-tracing" but of course that doesn't preclude Optix from running on a volta processor. It just means that instead of using the RTX engine for ray casting (as I have described here), an implementation is provided using "ordinary" CUDA code, perhaps including TensorCore usage. TensorCore is a hardware matrix-multiply engine. If you feel that constitutes "hardware accelerated ray tracing" then I'm not going to argue it. – Robert Crovella Apr 22 '21 at 01:30

1 Answers1

3

The short answer to the title question is: yes, hardware accelerated ray casting is available in CUDA & OptiX. The longer question has multiple interpretations, so I'll try to outline the different possibilities.

The different axes of your question that I'm seeing are: CUDA vs OptiX, pre-RTX GPUs vs RTX GPUs (e.g., Volta vs Ampere), min ray queries vs max ray queries, and possibly surface representations vs volume representations.

pre-RTX vs RTX GPUs:

To perhaps state the obvious, a K80 or a GV100 GPU can be used to accelerate ray casting compared to a CPU, due to the highly parallel nature of the GPU. However, these pre-RTX GPUs don't have any hardware that is specifically dedicated to ray casting. There are bits of somewhat special purpose hardware not dedicated to ray casting that you could probably leverage in various ways, so up to you to identify and design these kinds of hardware acceleration hacks.

The RTX GPUs starting with the Turing architecture do have specialized hardware dedicated to ray casting, so they accelerate ray queries even further than the acceleration you get from using just any GPU to parallelize the ray queries.

CUDA vs OptiX:

CUDA can be used for parallel ray tracing on any GPUs, but it does not currently (as I write this) support access to the specialized RTX hardware for ray tracing. When using CUDA, you would be responsible for writing all the code to build an acceleration structure (e.g. BVH) & traverse rays through the acceleration structure, and you would need to write the intersection and shading or hit-processing programs.

OptiX, Direct-X, and Vulkan all allow you to access the specialized ray-tracing hardware in RTX GPUs. By using these APIs, one can achieve higher speeds with lower power requirements, and they also require much less effort because the intersections and ray traversal through an acceleration structure are provided for you. These APIs also provide other commonly needed features for production-level ray casting, things like instancing, transforms, motion blur, as well as a single-threaded programming model for processing ray hits & misses.

Min vs Max ray queries:

OptiX has built-in functionality to return the surface intersection closest to the ray origin, i.e. a 'min query'. OptiX does not provide a similar single query for the furthest intersection (which is what I assume you mean by "max"). To find the maximum distance hit, or the closest hit to a second point on your ray, you would need to track through multiple hits and keep track of the hit that you want.

In CUDA you're on your own for detecting both min and max queries, so you can do whatever you want as long as you can write all the code.

Surfaces vs Volumes:

Your question mentioned a "3D volume", which has multiple meanings, so just to clarify things:

OptiX (+ DirectX + Vulkan) are APIs for ray tracing of surfaces, for example triangles meshes. The RTX specialty hardware is dedicated to accelerating ray tracing of surface based representations.

If your "3D volume" is referring to a volumetric representation such as voxel data or a tetrahedral mesh, then surface-based ray tracing might not be the fastest or most appropriate way to cast ray queries. In this case, you might want to use "ray marching" techniques in CUDA, or look at volumetric ray casting APIs for GPUs like NanoVDB.

David
  • 688
  • 7
  • 13