How rasterizer on GPU is arranged?

Question

During development of ray-tracer based on k-d tree acceleration structure I faced a problem. Sometimes during ray/triangle intersection test (using Möller–Trumbore algorithm) for two adjacent triangles ray miss both of them. It leads to flickering of bright dots between dark triangles (or vice versa for any contrast pair) during slow motion across view axis. Especially for high-polygonal models.

Assume, that ray/triangle intersection algorithm is hottest spot of entire ray-tracer (this assertion is the result of hard comprehensive full-scale benchmarking). Möller–Trumbore is the fastest algorithm for GPU (BTW I use GPU, not CPU). Due to SAH (surface area heuristic) about a half of a whole frame time is consumed by ray/triangle intersection tests.

To avoid the flickering I just use slightly widen triangles during ray/triangle intersection test on baricentric coordinates calculation and comparison step. These performed during runtime.

For triangles to be widen correctly I do the following: each calculated u, v or w multiplied by length of corresponding height of the triangle and compared with -EPSILON or 1.0 + EPSILON measured in physical units (say 0.0001 metres if height is measured in metres).

To calculate all three heights of the triangle I need to calculate its square, i.e. length of cross product: for ABC triangle is AB = B - A, AC = C - A, length(cross(AB, AC)), and lengths of each its side: length(AB), length(AC), BC = AC - AB, length(BC). Where length(vec) is sqrt(dot(vec, vec)) under the hood (mentioned to estimate its complexity to compute). Surely computation of sqrts can be easily avoided. But still this widening step consumes about a 10% of whole frame time. So there is a tradeoff between correcteness and runtime speed.

Now I recall, that rasterizer has no such a parameter as EPSILON at all. Its correcteness is not dependent on roundoff error issues.

How hardware rasterizer is arranged? Why it always gives correct results?

I can conjecture, that during traverse of neighbouring triangles rasterizer performs calculation in uniform manner and errors are became one-sided and thus mutually compensated on both sides.

Example of code (HLSL), where correcteness is sacrificed in favor of runtime speed:

bool KdTriIntersectCheck(TKdTree kdTree,
                         in TRay ray, in float tMin, in float tMax,
                         inout THit hit, in uint face)
{ // Möller–Trumbore
  float3 A = KdGetVertex(kdTree, hit.triangleIndex, 0);
  float3 AB = KdGetVertex(kdTree, hit.triangleIndex, 1) - A;
  float3 AC = KdGetVertex(kdTree, hit.triangleIndex, 2) - A;
  float3 P = cross(ray.direction, AC);
  float denominator = dot(AB, P);
  if (denominator <= 0.0) {
    return false;
  }
  float3 Q = ray.source - A;
  hit.uv.x = dot(Q, P);
  if ((hit.uv.x < 0.0) || (hit.uv.x > denominator)) {
    return false;
  }
  float3 R = cross(Q, AB);
  hit.uv.y = dot(ray.direction, R);
  if ((hit.uv.y < 0.0) || (hit.uv.x + hit.uv.y > denominator)) {
    return false;
  }
  hit.uv /= denominator;
  hit.distance = dot(AC, R) / denominator;
  hit.isFront = true;
  return (hit.distance >= tMin - EPSILON) && (hit.distance <= tMax + EPSILON);
}

Another useful links: 1.) [MS](https://msdn.microsoft.com/ru-ru/library/windows/desktop/cc627092(v=vs.85).aspx) 2.) [NVIDIA](http://research.nvidia.com/publication/high-performance-software-rasterization-gpus) — Tomilov Anatoliy, Mar 22 '18 at 06:35

How rasterizer on GPU is arranged?

0 Answers0