-1

I'm prototyping an app where I use CoreML to identify an object. That gives me a bounding box for the object (which has 4 values all between 0 and 1). I'd like to use the ARDepthData I have access to thanks to having a phone with LiDAR to then measure the distance to that object.

The CVPixelBuffer of sceneview.session.currentFrame?.capturedImage has dimensions 1920 x 1440. The CVPixelBuffer of sceneview.session.currentFrame?.sceneDepth.depthMap has dimensions 256 x 192.

How do I convert the bounding box of the VNRecognizedObjectObjservation object to give me the depth data I need to estimate the distance to the object?

mikeazo
  • 389
  • 2
  • 24

1 Answers1

1

Converting bounds of Vision requests can be tricky. Here is very helpful article on the subject:

https://machinethink.net/blog/bounding-boxes/

Also, I think new since the above article was written, there are some new helpful Vision functions such as VNImageRectForNormalizedRect.

let depthMapSize = depthMap.size
let boundingBox // your 0.0-1.0 bounds
let depthBounds = VNImageRectForNormalizedRect(boundingBox, depthMap.width, depthMap.height)

It returns image coordinates projected from a rectangle in a normalized coordinate space (which is what you have from Vision).

Jeshua Lacock
  • 5,730
  • 1
  • 28
  • 58