5

This might be more of a generic graphics programming question, but for now this is within the context of using Apple's Metal framework on macOS.

In NSView mouseDown, it's trivial to get the local coordinates of where the mouse down event took place by simply calling:

NSPoint localPoint = [self convertPoint:event.locationInWindow fromView:nil];

Given that local point, what are the steps required to determine where the mouse down occurred within the context of a rendered scene?

For now, I'm simply rendering a 2D plane in an MTKView. The 2D plane can be scaled, translated and rotated on the z-axis. I can somewhat brute-force the solution because the scene is so simple, but I'm wondering what the more correct approach is.

It feels as if I would have to duplicate some of the vertex shader logic in my Objective-C code to ensure that all the transforms are correctly applied. But I'm not quite sure how that world work when rotation is applied.

Very few of the Metal tutorials or references talk much about mouse input and how the coordinate systems interact. Any insight would be appreciated.

In this example, if the user clicked on the orange plane, how do you determine normalized coordinates within that specific object? (In this example, it might be something like [0.8, 0.9])

MTKView Diagram

kennyc
  • 5,490
  • 5
  • 34
  • 57
  • 3
    This is commonly called _picking_ or _hit-testing_. It's [been](https://www.opengl.org/archives/resources/faq/technical/selection.htm) [written](http://antongerdelan.net/opengl/raycasting.html) [about](http://schabby.de/picking-opengl-ray-tracing/) [extensively](https://www.mkonrad.net/2014/08/07/simple-opengl-object-picking-in-3d.html), though not often in Metal. The gist of it is that you compute the _inverse_ transformation from screen coordinates into world space (producing a ray), then perform intersection tests between the ray and the bounding volumes of the objects in the world. – warrenm Nov 02 '18 at 20:21
  • 1
    Knowing the right search term helps a lot, thank you. – kennyc Nov 02 '18 at 20:34

1 Answers1

5

Prompted in part by this question, I wrote an article on this subject that you may find useful. The sample code is in Swift, but the concepts transfer quite readily.

Here's a sketch of an Objective-C algorithm for transforming from screen-space coordinates to world-space coordinates on macOS:

// Get viewport dimensions
CGFloat width = view.bounds.size.width;
CGFloat height = view.bounds.size.height;

// Convert from AppKit view coordinates to Metal viewport coordinates
CGPoint location = [view convertPoint:event.locationInWindow toView:nil];
location.y = height - location.y;

// Compute clip-to-view and view-to-world matrices
simd_float4x4 inverseProjectionMatrix = simd_inverse(projectionMatrix);
simd_float4x4 inverseViewMatrix = simd_inverse(viewMatrix);

// Convert from screen coordinates to clip-space coordinates
float clipX = (2 * location.x) / width - 1;
float clipY = 1 - (2 * location.y) / height;
simd_float4 clipCoords = (simd_float4){ clipX, clipY, 0, 1 };

// Determine direction of picking ray in view space
simd_float4 eyeRayDir = simd_mul(inverseProjectionMatrix, clipCoords);
eyeRayDir.z = -1;
eyeRayDir.w = 0;

// Determine direction of picking ray in world space
simd_float4 worldRayDir = simd_mul(inverseViewMatrix, eyeRayDir);
worldRayDir = simd_normalize(worldRayDir);

// Determine origin of picking ray in world space
simd_float4 eyeRayOrigin = (simd_float4){ 0, 0, 0, 1};
simd_float4 worldRayOrigin = simd_mul(inverseViewMatrix, eyeRayOrigin);

...do intersection testing against object bounds...
warrenm
  • 31,094
  • 6
  • 92
  • 116
  • Exceptional! Thank you for the follow-up and the continued MBE contributions. The matrix math will take some time to understand, but a working example (on macOS no less) is a huge benefit to the community. – kennyc Nov 08 '18 at 07:45