I am attempting to learn object detection in iOS, and then mark the place of the detected object. I have the model trained and installed in the project. My next step is to show an AR View on screen. That is working. When I turn my vision processing code on via a button, I end up with the image on screen rotated and distorted (most likely just stretching due to inverted axis).
I found a partial tutorial that I was using to help guide me, and they seem to have run into this issue, solved it, but did not show the solution. I have no way of contacting the author. The author's comment was: one slightly tricky aspect to this was that the coordinate system returned from Vision was different than SwiftUI’s coordinate system (normalized and the y-axis was flipped), but some simple transformations did the trick.
I have no idea which simple transformations they were, but I suspect they were simd related. If anyone has insight into this, I would appreciate solving the rotation and distortion issue.
I do have error codes that appear in the console as soon as Vision starts:
Messages similar to this:
2022-05-12 21:14:39.142550-0400 Find My Apple Remote[66143:9990936] [Assets] Resolving material name 'engine:BuiltinRenderGraphResources/AR/arInPlacePostProcessCombinedPermute7.rematerial' as an asset path -- this usage is deprecated; instead provide a valid bundle
2022-05-12 21:14:39.270684-0400 Find My Apple Remote[66143:9991089] [Session] ARSession <0x111743970>: ARSessionDelegate is retaining 11 ARFrames. This can lead to future camera frames being dropped.
2022-05-12 21:14:40.121810-0400 Find My Apple Remote[66143:9991117] [CAMetalLayer nextDrawable] returning nil because allocation failed.
The one that concerns me the most is the last one.
My code, so far, is:
struct ContentView : View {
@State private var isDetecting = false
@State private var success = false
var body: some View {
VStack {
RealityKitView(isDetecting: $isDetecting, success: $success)
.overlay(alignment: .top) {
Image(systemName: (success ? "checkmark.circle" : "slash.circle"))
.foregroundColor(success ? .green : .red)
}
Button {
isDetecting.toggle()
} label: {
Text(isDetecting ? "Stop Detecting" : "Start Detecting")
.frame(width: 150, height: 50)
.background(
Capsule()
.fill(isDetecting ? Color.red.opacity(0.5) : Color.green.opacity(0.5))
)
}
}
}
}
import SwiftUI
import ARKit
import RealityKit
import Vision
struct RealityKitView: UIViewRepresentable {
let arView = ARView()
let scale = SIMD3<Float>(repeating: 0.1)
let model: VNCoreMLModel? = RealityKitView.returnMLModel()
@Binding var isDetecting: Bool
@Binding var success: Bool
@State var boundingBox: CGRect?
func makeUIView(context: Context) -> some UIView {
// Start AR Session
let session = configureSession()
// Handle ARSession events via delegate
session.delegate = context.coordinator
return arView
}
func configureSession() -> ARSession {
let session = arView.session
let config = ARWorldTrackingConfiguration()
config.planeDetection = [.horizontal, .vertical]
config.environmentTexturing = .automatic
session.run(config)
return session
}
static func returnMLModel() -> VNCoreMLModel? {
do {
let detector = try AppleRemoteDetector()
let model = try VNCoreMLModel(for: detector.model)
return model
} catch {
print("RealityKitView:returnMLModel failed with error: \(error)")
}
return nil
}
func updateUIView(_ uiView: UIViewType, context: Context) {}
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, ARSessionDelegate {
var parent: RealityKitView
init(_ parent: RealityKitView) {
self.parent = parent
}
func session(_ session: ARSession, didUpdate frame: ARFrame) {
// Start vision processing
if parent.isDetecting {
guard let model = parent.model else {
return
}
// I suspect the problem is here where the image is captured in a buffer, and then
// turned in to an input for the CoreML model.
let pixelBuffer = frame.capturedImage
let input = AppleRemoteDetectorInput(image: pixelBuffer)
do {
let request = VNCoreMLRequest(model: model) { (request, error) in
guard
let results = request.results,
!results.isEmpty,
let recognizedObjectObservation = results as? [VNRecognizedObjectObservation],
let first = recognizedObjectObservation.first
else {
self.parent.boundingBox = nil
self.parent.success = false
return
}
self.parent.success = true
print("\(first.boundingBox)")
self.parent.boundingBox = first.boundingBox
}
model.featureProvider = input
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.right, options: [:])
try handler.perform([request])
} catch {
print(error)
}
}
}
}
}