How to translate X-axis correctly from VNFaceObservation boundingBox (Vision + ARKit)

Question

I'm using both ARKit & Vision, following along Apple's sample project, "Using Vision in Real Time with ARKit". So I am not setting up my camera as ARKit handles that for me.

Using Vision's VNDetectFaceRectanglesRequest, I'm able to get back a collection of VNFaceObservation objects.

Following various guides online, I'm able to transform the VNFaceObservation's boundingBox to one that I can use on my ViewController's UIView.

The Y-axis is correct when placed on my UIView in ARKit, but the X-axis is completely off & inaccurate.

// face is an instance of VNFaceObservation  
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -view.frame.height)  
let translate = CGAffineTransform.identity.scaledBy(x: view.frame.width, y: view.frame.height)  
let rect = face.boundingBox.applying(translate).applying(transform)

What is the correct way to display the boundingBox on the screen (in ARKit/UIKit) so that the X & Y axis match up correctly to the detected face rectangle? I can't use self.cameraLayer.layerRectConverted(fromMetadataOutputRect: transformedRect) since I'm not using AVCaptureSession.

Update: Digging into this further, the camera's image is 1920 x 1440. Most of it is not displayed on ARKit's screen space. The iPhone XS screen is 375 x 812 points.

After I get Vision's observation boundingBox, I've transformed it to fit the current view (375 x 812). This isn't working since the actual width seems to be 500 (the left & right sides are out of the screen view). How do I CGAffineTransform the CGRect bounding box (seems like 500x812, a total guess) from 375x812?

score 4 · Answer 1 · answered Feb 13 '19 at 12:41

The key piece missing here is ARFrame's displayTransform(for:viewportSize:). You can read the documentation for it here.

This function will generate the appropriate transform for a given frame and viewport size (the CGRect of the view you're displaying the image and bounding box in).

func visionTransform(frame: ARFrame, viewport: CGRect) -> CGAffineTransform {
    let orientation = UIApplication.shared.statusBarOrientation
    let transform = frame.displayTransform(for: orientation,
                                           viewportSize: viewport.size)
    let scale = CGAffineTransform(scaleX: viewport.width,
                                  y: viewport.height)

    var t = CGAffineTransform()
    if orientation.isPortrait {
        t = CGAffineTransform(scaleX: -1, y: 1)
        t = t.translatedBy(x: -viewport.width, y: 0)
    } else if orientation.isLandscape {
        t = CGAffineTransform(scaleX: 1, y: -1)
        t = t.translatedBy(x: 0, y: -viewport.height)
    }

    return transform.concatenating(scale).concatenating(t)
}

You can then use this like so:

let transform = visionTransform(frame: yourARFrame, viewport: yourViewport)
let rect = face.boundingBox.applying(transform)

thanks, I was using displayTransform, but I'm sure I was using it incorrectly. I need some time to try out your solution and if it works, I'll accept it :) — xta, Feb 28 '19 at 09:08

How to translate X-axis correctly from VNFaceObservation boundingBox (Vision + ARKit)

1 Answers1