I am total beginner in Swift & iOS, and I am trying to:
- Visualise the depth map on the phone screen, instead of the actual video recording.
- Save both the RGB and depth data stream.
I am currently stuck on the first one. I am using ARKit4 with MetalKit. It seems that I can get the depth data from the frame, but the visualisation that I am rendering is really bad. According to the ARKit4 video (https://youtu.be/SpZyxHkmfqE?t=1132 - with timestamp), the quality of the depth map is really low, the colors are actually different, and the distant objects are not shown at all (of course, I do not mean really distant objects, but even on ~1m it already completely fails in the indoor static environment). Examples are in the bottom of the question.
My ViewController.swift
:
import UIKit
import Metal
import MetalKit
import ARKit
extension MTKView : RenderDestinationProvider {
}
class ViewController: UIViewController, MTKViewDelegate, ARSessionDelegate {
var session: ARSession!
var configuration = ARWorldTrackingConfiguration()
var renderer: Renderer!
var depthBuffer: CVPixelBuffer!
var confidenceBuffer: CVPixelBuffer!
override func viewDidLoad() {
super.viewDidLoad()
// Set the view's delegate
session = ARSession()
session.delegate = self
// Set the view to use the default device
if let view = self.view as? MTKView {
view.device = MTLCreateSystemDefaultDevice()
view.backgroundColor = UIColor.clear
view.delegate = self
guard view.device != nil else {
print("Metal is not supported on this device")
return
}
// Configure the renderer to draw to the view
renderer = Renderer(session: session, metalDevice: view.device!, renderDestination: view)
renderer.drawRectResized(size: view.bounds.size)
}
//let tapGesture = UITapGestureRecognizer(target: self, action: #selector(ViewController.handleTap(gestureRecognize:)))
//view.addGestureRecognizer(tapGesture)
}
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
// Create a session configuration
//let configuration = ARWorldTrackingConfiguration()
configuration.frameSemantics = .sceneDepth
// Run the view's session
session.run(configuration)
UIApplication.shared.isIdleTimerDisabled = true
}
override func viewWillDisappear(_ animated: Bool) {
super.viewWillDisappear(animated)
// Pause the view's session
session.pause()
}
/*@objc
func handleTap(gestureRecognize: UITapGestureRecognizer) {
// Create anchor using the camera's current position
if let currentFrame = session.currentFrame {
// Create a transform with a translation of 0.2 meters in front of the camera
var translation = matrix_identity_float4x4
translation.columns.3.z = -0.2
let transform = simd_mul(currentFrame.camera.transform, translation)
// Add a new anchor to the session
let anchor = ARAnchor(transform: transform)
session.add(anchor: anchor)
}
}
*/
// MARK: - MTKViewDelegate
// Called whenever view changes orientation or layout is changed
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
renderer.drawRectResized(size: size)
}
// Called whenever the view needs to render
func draw(in view: MTKView) {
renderer.update()
}
// MARK: - ARSessionDelegate
func session(_ session: ARSession, didFailWithError error: Error) {
// Present an error message to the user
}
func sessionWasInterrupted(_ session: ARSession) {
// Inform the user that the session has been interrupted, for example, by presenting an overlay
}
func sessionInterruptionEnded(_ session: ARSession) {
// Reset tracking and/or remove existing anchors if consistent tracking is required
}
}
My Renderer.swift
(only the modified functions updateCaptureImageTextures(frame: ARFrame)
and drawCapturedImage(renderEncoder: MTLRenderCommandEncoder)
:
import Foundation
import Metal
import MetalKit
import ARKit
protocol RenderDestinationProvider {
var currentRenderPassDescriptor: MTLRenderPassDescriptor? { get }
var currentDrawable: CAMetalDrawable? { get }
var colorPixelFormat: MTLPixelFormat { get set }
var depthStencilPixelFormat: MTLPixelFormat { get set }
var sampleCount: Int { get set }
}
// The max number of command buffers in flight
let kMaxBuffersInFlight: Int = 3
// The max number anchors our uniform buffer will hold
let kMaxAnchorInstanceCount: Int = 64
// The 16 byte aligned size of our uniform structures
let kAlignedSharedUniformsSize: Int = (MemoryLayout<SharedUniforms>.size & ~0xFF) + 0x100
let kAlignedInstanceUniformsSize: Int = ((MemoryLayout<InstanceUniforms>.size * kMaxAnchorInstanceCount) & ~0xFF) + 0x100
// Vertex data for an image plane
let kImagePlaneVertexData: [Float] = [
-1.0, -1.0, 0.0, 1.0,
1.0, -1.0, 1.0, 1.0,
-1.0, 1.0, 0.0, 0.0,
1.0, 1.0, 1.0, 0.0,
]
class Renderer {
let session: ARSession
let device: MTLDevice
let inFlightSemaphore = DispatchSemaphore(value: kMaxBuffersInFlight)
var renderDestination: RenderDestinationProvider
// Metal objects
var commandQueue: MTLCommandQueue!
var sharedUniformBuffer: MTLBuffer!
var anchorUniformBuffer: MTLBuffer!
var imagePlaneVertexBuffer: MTLBuffer!
var capturedImagePipelineState: MTLRenderPipelineState!
var capturedImageDepthState: MTLDepthStencilState!
var anchorPipelineState: MTLRenderPipelineState!
var anchorDepthState: MTLDepthStencilState!
var capturedImageTextureY: CVMetalTexture?
var capturedImageTextureCbCr: CVMetalTexture?
// Captured image texture cache
var capturedImageTextureCache: CVMetalTextureCache!
// Metal vertex descriptor specifying how vertices will by laid out for input into our
// anchor geometry render pipeline and how we'll layout our Model IO vertices
var geometryVertexDescriptor: MTLVertexDescriptor!
// MetalKit mesh containing vertex data and index buffer for our anchor geometry
var cubeMesh: MTKMesh!
// Used to determine _uniformBufferStride each frame.
// This is the current frame number modulo kMaxBuffersInFlight
var uniformBufferIndex: Int = 0
// Offset within _sharedUniformBuffer to set for the current frame
var sharedUniformBufferOffset: Int = 0
// Offset within _anchorUniformBuffer to set for the current frame
var anchorUniformBufferOffset: Int = 0
// Addresses to write shared uniforms to each frame
var sharedUniformBufferAddress: UnsafeMutableRawPointer!
// Addresses to write anchor uniforms to each frame
var anchorUniformBufferAddress: UnsafeMutableRawPointer!
// The number of anchor instances to render
var anchorInstanceCount: Int = 0
// The current viewport size
var viewportSize: CGSize = CGSize()
// Flag for viewport size changes
var viewportSizeDidChange: Bool = false
var depthTexture: CVMetalTexture?
var confidenceTexture: CVMetalTexture?
.......................................
func updateCapturedImageTextures(frame: ARFrame) {
// Create two textures (Y and CbCr) from the provided frame's captured image
//
guard let depthData = frame.sceneDepth ?? frame.sceneDepth else { return }
var pixelBufferDepth: CVPixelBuffer!
pixelBufferDepth = depthData.depthMap
var texturePixelFormat: MTLPixelFormat!
setMTLPixelFormat(&texturePixelFormat, basedOn: pixelBufferDepth)
depthTexture = createTexture(fromPixelBuffer: pixelBufferDepth, pixelFormat: texturePixelFormat, planeIndex: 0)
pixelBufferDepth = depthData.confidenceMap
setMTLPixelFormat(&texturePixelFormat, basedOn: pixelBufferDepth)
confidenceTexture = createTexture(fromPixelBuffer: pixelBufferDepth, pixelFormat: texturePixelFormat, planeIndex: 0)
let pixelBuffer = frame.capturedImage
if (CVPixelBufferGetPlaneCount(pixelBuffer) < 2) {
return
}
capturedImageTextureY = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat:.r8Unorm, planeIndex:0)
capturedImageTextureCbCr = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat:.rg8Unorm, planeIndex:1)
}
func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> CVMetalTexture? {
let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)
var texture: CVMetalTexture? = nil
let status = CVMetalTextureCacheCreateTextureFromImage(nil, capturedImageTextureCache, pixelBuffer, nil, pixelFormat, width, height, planeIndex, &texture)
if status != kCVReturnSuccess {
texture = nil
}
return texture
}
func drawCapturedImage(renderEncoder: MTLRenderCommandEncoder) {
guard let textureY = capturedImageTextureY, let textureCbCr = capturedImageTextureCbCr, let depthTexture = depthTexture, let confidenceTexture = confidenceTexture else {
return
}
// Push a debug group allowing us to identify render commands in the GPU Frame Capture tool
renderEncoder.pushDebugGroup("DrawCapturedImage")
// Set render command encoder state
renderEncoder.setCullMode(.none)
renderEncoder.setRenderPipelineState(capturedImagePipelineState)
renderEncoder.setDepthStencilState(capturedImageDepthState)
// Set mesh's vertex buffers
renderEncoder.setVertexBuffer(imagePlaneVertexBuffer, offset: 0, index: Int(kBufferIndexMeshPositions.rawValue))
// Set any textures read/sampled from our render pipeline
//renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureY), index: Int(kTextureIndexY.rawValue))
//renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureCbCr), index: Int(kTextureIndexCbCr.rawValue))
renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(depthTexture), index: 2)
//renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(confidenceTexture), index: 3)
// Draw each submesh of our mesh
renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
renderEncoder.popDebugGroup()
}
}
Everything else is the same like in MetalKit default template of Xcode.
So, do I access the data in some wrong way? Do I have some configuration parameters wrong? Do I just render the depth map in some bad way? Or the sensor on new iPhone just really has so bad data (though does not look like, as I have managed to acquire decent 3D point clouds with some apps from AppStore, even on distance of 3-4 meters).
Update: I've figured out that the quality is better if I change renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(depthTexture), index: 2)
to renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(depthTexture), index: 1)
. This is, however, just a random observation because the documentation is... well, not very extensive. The rendered image is, however, still green-to-white, while I want it to be either grayscale, or looking as the RGB map shown in the referenced video (that would be perfect, but the grayscale version would be enough).