Raycasting (Mouse Picking) while using an Perspective VS Orthographic Projection in OpenGL

Question

I am struggling to understand how to change my algorithm to handle raycasting (utilized for MousePicking) using a Perspective projection and an Orthographic projection.

Currently I have a scene with 3D objects that have AxisAligned bounding boxes attached to them.

While rendering the scene using a perspective projection (created with glm::perspective) I can successfully use raycasting and my mouse to "pick" different objects in my scene. Here is a demonstration.

If I render the same scene, but using an Orthographic projection, and positioning the camera above the facing down (looking down the Y axis, Imagine like a level editor fora game) I am unable to correctly raycasting from the where the user clicks on the screen so I can get MousePicking working while rendering using an Orthographic projection. Here is a demonstration of it not working.

My algorithm at a high level:

auto const coords = mouse.coords();
glm::vec2 const mouse_pos{coords.x, coords.y};

glm::vec3 ray_dir, ray_start;
if (perspective) { // This "works"
      auto const ar  = aspect_rate;
      auto const fov = field_of_view;

      glm::mat4 const proj_matrix = glm::perspective(fov, ar, f.near, f.far);
      auto const& target_pos      =  camera.target.get_position();
      glm::mat4 const view_matrix = glm::lookAt(target_pos, target_pos, glm::vec3{0, -1, 0});

      ray_dir   = Raycast::calculate_ray_into_screen(mouse_pos, proj_matrix, view_matrix, view_rect);
      ray_start = camera.world_position();
}
else if (orthographic) { // This "doesn't work"
      glm::vec3 const POS     = glm::vec3{50};
      glm::vec3 const FORWARD = glm::vec3{0, -1, 0};
      glm::vec3 const UP      = glm::vec3{0, 0, -1};

      // 1024, 768 with NEAR 0.001 and FAR 10000
      //glm::mat4 proj_matrix = glm::ortho(0, 1024, 0, 768, 0.0001, 10000);
      glm::mat4 proj_matrix = glm::ortho(0, 1024, 0, 768, 0.0001, 100);
      // Look down at the scene from above  
      glm::mat4 view_matrix = glm::lookAt(POS, POS + FORWARD, UP);
      // convert the mouse screen coordinates into world coordinates for the cube/ray test
      auto const p0 = screen_to_world(mouse_pos, view_rect, proj_matrix, view_matrix, 0.0f);
      auto const p1 = screen_to_world(mouse_pos, view_rect, proj_matrix, view_matrix, 1.0f);

      ray_start = p0;
      ray_dir = glm::normalize(p1 - p0);
    }
bool const intersects = ray_intersects_cube(logger, ray_dir, ray_start,
                                                eid, tr, cube, distances);

In perspective mode, we cast a ray into the scene and see if it intersects with the cube surrounding the object.

In orthographic mode, I'm casting two rays from the screen (one at z=0, the other at z=1) and creating a ray between those two points. I set the ray start point to where the mouse pointer is (with z=0) and use the ray direction just calculated as inputs into the same ray_cube_intersection algorithm.

My question is this

Since the MousePicking works using the Perspective projection, but not using an Orthographic projection:

Is it reasonable to assume the same ray_cube intersection algorithm can be used with a perspective/orthographic projection?
Is my thinking about setting the ray_start and ray_dir variables in the orthographic case correct?

Here is the source for the ray/cube collision algorithm in use.

glm::vec3
Raycast::calculate_ray_into_screen(glm::vec2 const& point, glm::mat4 const& proj,
                                   glm::mat4 const& view, Rectangle const& view_rect)
{
  // When doing mouse picking, we want our ray to be pointed "into" the screen
  float constexpr Z            = -1.0f;
  return screen_to_world(point, view_rect, proj, view, Z);
}

bool
ray_cube_intersect(Ray const& r, Transform const& transform, Cube const& cube,
    float& distance)
{
  auto const& cubepos = transform.translation;

  glm::vec3 const                minpos = cube.min * transform.scale;
  glm::vec3 const                maxpos = cube.max * transform.scale;
  std::array<glm::vec3, 2> const bounds{{minpos + cubepos, maxpos + cubepos}};

  float txmin = (bounds[    r.sign[0]].x - r.orig.x) * r.invdir.x;
  float txmax = (bounds[1 - r.sign[0]].x - r.orig.x) * r.invdir.x;
  float tymin = (bounds[    r.sign[1]].y - r.orig.y) * r.invdir.y;
  float tymax = (bounds[1 - r.sign[1]].y - r.orig.y) * r.invdir.y;

  if ((txmin > tymax) || (tymin > txmax)) {
    return false;
  }
  if (tymin > txmin) {
    txmin = tymin;
  }
  if (tymax < txmax) {
    txmax = tymax;
  }

  float tzmin = (bounds[    r.sign[2]].z - r.orig.z) * r.invdir.z;
  float tzmax = (bounds[1 - r.sign[2]].z - r.orig.z) * r.invdir.z;

  if ((txmin > tzmax) || (tzmin > txmax)) {
    return false;
  }

  distance = tzmin;
  return true;
}

edit: The math space conversions functions I'm using:

namespace boomhs::math::space_conversions
{

inline glm::vec4
clip_to_eye(glm::vec4 const& clip, glm::mat4 const& proj_matrix, float const z)
{
  auto const      inv_proj   = glm::inverse(proj_matrix);
  glm::vec4 const eye_coords = inv_proj * clip;
  return glm::vec4{eye_coords.x, eye_coords.y, z, 0.0f};
}

inline glm::vec3
eye_to_world(glm::vec4 const& eye, glm::mat4 const& view_matrix)
{
  glm::mat4 const inv_view  = glm::inverse(view_matrix);
  glm::vec4 const ray       = inv_view * eye;
  glm::vec3 const ray_world = glm::vec3{ray.x, ray.y, ray.z};
  return glm::normalize(ray_world);
}

inline constexpr glm::vec2
screen_to_ndc(glm::vec2 const& scoords, Rectangle const& view_rect)
{
  float const x = ((2.0f * scoords.x) / view_rect.right()) - 1.0f;
  float const y = ((2.0f * scoords.y) / view_rect.bottom()) - 1.0f;

  auto const assert_fn = [](float const v) {
    assert(v <= 1.0f);
    assert(v >= -1.0f);
  };
  assert_fn(x);
  assert_fn(y);
  return glm::vec2{x, -y};
}

inline glm::vec4
ndc_to_clip(glm::vec2 const& ndc, float const z)
{
  return glm::vec4{ndc.x, ndc.y, z, 1.0f};
}

inline glm::vec3
screen_to_world(glm::vec2 const& scoords, Rectangle const& view_rect, glm::mat4 const& proj_matrix,
                glm::mat4 const& view_matrix, float const z)
{
  glm::vec2 const ndc   = screen_to_ndc(scoords, view_rect);
  glm::vec4 const clip  = ndc_to_clip(ndc, z);
  glm::vec4 const eye   = clip_to_eye(clip, proj_matrix, z);
  glm::vec3 const world = eye_to_world(eye, view_matrix);
  return world;
}

} // namespace boomhs::math::space_conversions

The last parameter of `screen_to_world` seems to be the NDC z coordinate. So shouldn't it be *-1.0f* instead of *0.0f* for `p0`, like in `Raycast::calculate_ray_into_screen`? — Rabbid76, Sep 13 '18 at 07:41
A minor issue: `near, far = 0.0001, 10000` working with floats (up to 7 true digits) makes `far-near=far` which messes the projection matrix — Ripi2, Sep 13 '18 at 16:04
@Ripi2 I adjusted my far to 100, and the scene renders everything correctly. Thank you for your advice about the floating-point imprecision error. I believe adjusting my values such that near, far = 0.001, 100.0 should keep the values within the range expected for the projection matrix. — Short, Sep 13 '18 at 16:27
@Rabbid76 Ah, ok I thought about this and I think that makes sense. Casting a ray into the screen, I believe the two Z values should be 0 and -1 (as your indicated) for raycasting in this scene (-Z is into the screen). — Short, Sep 13 '18 at 16:29
@Short No. I cant' see your implementation, but if `screen_to_world` is a function that "unprojects" a point, then it is either a depth value in [0.0, 1.0] or a normalized device coordinate (NDC) in [-1.0, 1.0]. Note, in normalized device space the z axis points from the near plane to the far plane. In view space the z axis would point out of the viewport. The view space is before the projection is applied. The projection matrix inverts the z axis (in general)! — Rabbid76, Sep 13 '18 at 16:40

AudioGuy · Answer 1 · 2021-03-28T08:45:15.867

I worked on this for several days because I ran into the same problem. The unproject methods that we are used to work with are working 100% correctly here as well - even with orthographic projection. But with orthographic projection the direction vector going from the camera position into the screen is always the same. So, unprojecting the cursor in the same way dies not work as intended in this case.

What you want to do is getting the camera direction vector as it is but in order to get the ray origin you need to shift the camera position according to the current mouse position on screen.

My approach (C#, but you'll get the idea):

Vector3 worldUpDirection = new Vector3(0, 1, 0); // if your world is y-up

// Get mouse coordinates (2d) relative to window position:
Vector2 mousePosRelativeToWindow = GetMouseCoordsRelativeToWindow(); // (0,0) would be top left window corner

// get camera direction vector:
Vector3 camDirection = Vector3.Normalize(cameraTarget - cameraPosition);

// get x and y coordinates relative to frustum width and height.
// glOrthoWidth and glOrthoHeight are the sizeX and sizeY values 
// you created your projection matrix with. If your frustum has a width of 100, 
// x would become -50 when the mouse is left and +50 when the mouse is right.
float x = +(2.0f * mousePosRelativeToWindow .X / viewportWidth  - 1) * (glOrthoWidth  / 2);
float y = -(2.0f * mousePosRelativeToWindow .Y / viewPortHeight - 1) * (glOrthoHeight / 2);

// Now, you want to calculate the camera's local right and up vectors 
// (depending on the camera's current view direction):
Vector3 cameraRight = Vector3.Normalize(Vector3.Cross(camDirection, worldUpDirection));
Vector3 cameraUp = Vector3.Normalize(Vector3.Cross(cameraRight, camDirection));

// Finally, calculate the ray origin:
Vector3 rayOrigin = cameraPosition + cameraRight * x + cameraUp * y;
Vector3 rayDirection = camDirection;

Now you have the ray origin and the ray direction for your orthographic projection. With these you can run any ray-plane/volume-intersections as usual.

Raycasting (Mouse Picking) while using an Perspective VS Orthographic Projection in OpenGL

My question is this

1 Answers1